system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses inefficiencies in schedule management and information analysis by using speech recognition and encryption to provide personalized, secure, and multilingual support for enhanced productivity in modern business environments.

JP2026096509APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Application Information

Patent Timeline

03 Dec 2024

Application

15 Jun 2026

Publication

JP2026096509A

IPC: G06Q10/109

AI Tagging

Application Domain

Instruments

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing systems lack integrated solutions for efficient schedule management, information analysis, multilingual support, and security in modern business environments, leading to decreased productivity and time wastage.

Method used

A system utilizing speech recognition to convert voice instructions into text, analyze data, automatically register schedules and tasks, synchronize across devices, and provide personalized work environments with multilingual support and encryption for enhanced productivity and security.

Benefits of technology

The system streamlines schedule management, supports optimal decision-making, and ensures secure, multilingual operations by integrating voice recognition, natural language processing, big data analysis, and encryption, thereby improving work productivity.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096509000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A means of receiving instructions via speech recognition and converting those instructions into text data, A method for analyzing text data and automatically registering schedules and tasks, A means of syncing schedules and providing reminders across multiple devices, A means of making recommendations to support decision-making by analyzing past data and external information, A means of learning the user's behavioral history and proposing the optimal work environment, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In a modern business environment, efficient schedule management and information analysis are required. However, existing systems are limited to individual functions, and it is necessary to use multiple systems in combination to improve overall efficiency. With many operations being carried out globally, it is important to support multiple languages and ensure security, but there is no secretarial system that can manage these integratively. Under such circumstances, there is a problem that business productivity decreases and time is wasted.

Means for Solving the Problems

[0005] This invention provides a system that uses speech recognition to receive instructions in real time, converts them into text, and analyzes them. Based on the analyzed data, it automatically registers schedules and tasks and synchronizes them across multiple devices via the cloud, thereby achieving consistent schedule management. Furthermore, it utilizes big data to analyze past behavioral data and external resources to support optimal decision-making. In addition, by learning user behavior patterns, it provides individually customized work environments and priority suggestions. Supports international operations through multilingual support and ensures information security through encryption and access control. Through these means, it is possible to improve work productivity and effectively solve problems.

[0006] "Speech recognition" is a technology that analyzes speech data, converts it into text data, and understands user instructions and information.

[0007] "Means of converting instructions into text data" refers to a technical method that receives voice instructions and converts their content into textual information.

[0008] "Methods for analyzing text data" refer to the process of processing character information, understanding its content, and identifying the necessary actions.

[0009] "Methods for automatically registering appointments and tasks" refer to functions that register schedules and work items in the system based on analyzed information.

[0010] "Methods for synchronizing schedules across devices" refer to technologies that update and share information so that data on different devices always matches.

[0011] "Means of providing reminders" refer to pre-set alarms and notification functions that inform users of important appointments and tasks.

[0012] "Methods for analyzing big data" refer to technologies that process and analyze large amounts of data to extract useful information and trends.

[0013] "Recommendations to support decision-making" is a process of providing advice and suggestions to help users make the best choices based on analyzed information.

[0014] "Methods for learning user behavior history" refer to technologies that accumulate and analyze users' past behavior patterns to infer their habits and preferences.

[0015] "Means of proposing work environments" refers to a function that presents the optimal work environment and work methods based on the user's behavior patterns.

[0016] "Multilingual support" refers to a system function that has the ability to understand and generate speech and text in multiple languages.

[0017] "Supporting international business" means providing technical assistance to facilitate business operations that span different languages and cultural regions.

[0018] "Encryption" is a technology that transforms data using a specific algorithm to ensure the security of information and prevent access by third parties.

[0019] "Access control" is a function that maintains information security by controlling access to data for specific users or devices. [Brief explanation of the drawing]

[0020] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of the data processing device and smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0021] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described according to the accompanying drawings.

[0022] First, the terms used in the following description will be explained.

[0023] In the following embodiments, the signed processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Furthermore, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), and APU (Accelerated Processing Unit).

[0024] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0025] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0026] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0027] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0028] [First Embodiment]

[0029] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0030] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0031] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0032] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0033] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0034] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0035] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0036] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0037] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0038] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0039] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0040] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0041] In implementing the present invention, a system is provided in which the central component, an AI agent, functions as a voice recognition, data processing, and a user interface. Specific embodiments of this system are described below.

[0042] Speech recognition and instruction analysis

[0043] Users can input voice instructions into the system in a conversational format. The server uses speech recognition technology to convert the input voice into text, and a natural language processing engine analyzes the user's instructions and intentions. This analysis automatically extracts the tasks and schedules the user requests.

[0044] Schedule management and synchronization

[0045] Based on the analysis results, the server registers the schedule in the cloud and sets reminders as needed. Devices synchronize this data via the cloud, allowing users to view their latest schedules on various devices such as desktops, tablets, and smartphones. Users can enjoy the convenience of easily managing their own schedules.

[0046] Support for decision-making and suggestion of priorities

[0047] The server utilizes big data analytics technology to process users' past behavioral patterns and external information. Based on the insights gained, it presents users with objective information and suggestions to support their future decision-making. For example, it makes suggestions to make future choices easier based on the options the user frequently selects.

[0048] Providing an individualized work environment

[0049] This system learns the user's behavioral history and preferences to propose an optimized workflow. This maximizes user productivity and creates a personalized work environment. The server, in turn, can continuously provide a more accurate service.

[0050] Security and multilingual support

[0051] In terms of security, the server encrypts all data and has features to manage access rights, thus protecting user data. Furthermore, this system supports multiple languages, enabling smooth operation even in international business environments.

[0052] As a concrete example, when a user plans a business trip, they can give a voice command such as "Prepare my business trip schedule for next week." The server then references past business trip data to create a schedule and automatically generates necessary documents and packing lists. This is automatically synchronized across multiple devices, allowing the user to efficiently complete their preparations before departure.

[0053] Thus, by utilizing the present invention, users can automate complex tasks and streamline their entire operations.

[0054] The following describes the processing flow.

[0055] Step 1:

[0056] The user gives instructions to the voice input device. They communicate their schedule by voice, such as "Schedule a project meeting for next Monday."

[0057] Step 2:

[0058] The server sends the received audio to the speech recognition engine. The engine converts the audio data into text data.

[0059] Step 3:

[0060] The server uses natural language processing to analyze text data to determine the date, time, and event details. For example, it extracts the date information "next Monday."

[0061] Step 4:

[0062] Based on the analyzed information, the server registers the event in the cloud-based calendar system.

[0063] Step 5:

[0064] The server sets up a reminder and prepares to notify you 30 minutes before the meeting.

[0065] Step 6:

[0066] The terminal synchronizes the latest schedule data from the cloud system, ensuring that information is displayed consistently across multiple devices.

[0067] Step 7:

[0068] The server sends a notification to the user's device at the scheduled time according to the set reminder. By checking this notification, the user can ensure they don't forget to participate in the scheduled event.

[0069] Step 8:

[0070] When a user requests a suggestion or decision, the server performs big data analysis and provides the best possible suggestion based on past behavior and external information.

[0071] Step 9:

[0072] The server receives user feedback, updates its machine learning model, and makes future recommendations more accurate.

[0073] Through this series of processes, the system automates user schedule management and supports effective decision-making.

[0074] (Example 1)

[0075] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0076] In today's busy work environment, users demand systems that efficiently manage their schedules and automate tasks. However, conventional systems have suffered from low accuracy in recognizing voice commands and incomplete synchronization across multiple devices. Furthermore, they lacked features to support decision-making, making it difficult to provide a work environment tailored to individual user needs. In addition, insufficient multilingual support and security measures limited support for international business operations.

[0077] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0078] This invention includes a server that receives instructions via speech recognition technology and converts those instructions into text data, a means for analyzing the text data and automatically registering schedule management data and tasks, and a means for synchronizing schedule management across multiple information terminals and providing notification functions. As a result, users can streamline schedule management through highly accurate speech recognition and check the latest information across multiple devices. Furthermore, by utilizing past usage data, users can receive suggestions to support decision-making, thereby optimizing and streamlining operations. In addition, multilingual functionality and data encryption enable support for international business and high security.

[0079] "Voice recognition technology" is a technology that converts a user's voice into a digital signal and generates text data or instructions by analyzing its content.

[0080] "Character data" refers to digital data used for information processing and data communication, based on strings of characters obtained through speech recognition or manual input.

[0081] "Schedule management data" refers to data used to electronically record and manage users' schedules and task information.

[0082] An "information terminal" refers to a device used to manipulate digital data, such as a computer, smartphone, or tablet.

[0083] "Schedule management" refers to activities and methods for organizing and managing users' schedules.

[0084] A "notification function" is a feature that provides users with timely information or warnings that have been set in advance.

[0085] "Multilingual functionality" refers to the ability of a system or service to use and switch between multiple languages simultaneously.

[0086] "Data encryption" is the process of transforming data using cryptographic techniques to protect it from unauthorized access.

[0087] "Access rights management" refers to a management method that controls access rights to specific information or functions, ensuring that only authorized users can access them.

[0088] The present invention aims to provide a system that allows users to efficiently manage their schedules using voice commands. This system operates by integrating voice recognition technology, natural language processing technology, data synchronization technology, big data analysis technology, multilingual support, and security technology.

[0089] The server uses common speech recognition technology to convert the user's voice instructions into text data. Specifically, this can be done using a speech recognition API. The converted text data is then analyzed by a natural language processing engine (e.g., a generative AI model) to understand the user's intent and instructions. This analysis automatically generates schedule management data and tasks.

[0090] The analyzed data is registered on a cloud platform for schedule management. This registration can be done using a cloud service API. The device synchronizes this data via the cloud, allowing users to access it from multiple devices such as PCs, tablets, and smartphones. This makes it possible to check the latest schedule from any device.

[0091] Furthermore, the server can use big data analytics technology to process users' past behavioral data and external information. This processing makes it possible to provide suggestions to support decision-making. For example, based on past business trip history, it can automatically generate suggestions and packing lists for the next business trip.

[0092] Multilingual support enables smooth operation even in international environments. Data encryption technology and access control ensure the secure protection of user data.

[0093] For example, if a user voice-inputs "Prepare my business trip schedule for next week," the server uses speech recognition technology to convert the input into text, analyzes it using natural language processing, and then automatically creates a business trip schedule, generating a list of necessary documents and items to pack. This information is synchronized across multiple devices, allowing the user to prepare efficiently.

[0094] Example prompt: "The user has given the voice command 'Prepare my business trip schedule for next week.' Start the process of using past business trip data to create a schedule, generate a packing list, and sync it to the device."

[0095] In this way, this system can automate and streamline users' work while simultaneously providing a secure and multilingual system environment.

[0096] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0097] Step 1:

[0098] The user gives instructions to the system via voice input. Specifically, the user might say into the microphone, "Prepare my travel schedule for next week." This voice input becomes the input data for the system.

[0099] Step 2:

[0100] The server uses speech recognition technology to convert the input speech into text data. It analyzes the speech data and generates a string by mapping each phoneme. The output of this process is the user's instruction in text form. The resulting text data is "Prepare the schedule for next week's business trip."

[0101] Step 3:

[0102] The server passes the generated text data to a natural language processing engine to analyze the user's intent. A generative AI model is used to syntactically analyze the text and extract the user's instructions. The output of the analysis consists of specific action items necessary for schedule registration and task creation. Examples include "Create a business trip schedule," "Generate a document list," and "Create a packing checklist."

[0103] Step 4:

[0104] The server operates the schedule management application based on the analyzed data and registers the appointment on the cloud. Specifically, it uses an API to create a new date and registers related information. As an output, the new business trip appointment is added to the cloud calendar.

[0105] Step 5:

[0106] The device retrieves new schedule information registered in the cloud. By synchronizing data from the cloud, it becomes possible to display the same schedule on multiple devices such as smartphones and PCs. As output, the latest schedule is updated and displayed on the device.

[0107] Step 6:

[0108] The server uses big data analytics to investigate past travel data and related information, generating suggestions to support user decision-making. Prompt messages are used to gain insights from the generated AI model. The output includes an optimal packing list and important points for the user.

[0109] Step 7:

[0110] Users can review updated schedules and suggestions from the server, and make modifications as needed. By utilizing the suggestions provided via smart devices, more efficient schedule management is possible. The output is a finalized, optimized schedule.

[0111] (Application Example 1)

[0112] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0113] In modern life, individual users' schedules and tasks are frequently updated and demanding, making efficient management difficult. Furthermore, while there is a growing need for devices that support personal daily life to provide more advanced assistance, the means to achieve this are not yet sufficiently available. Therefore, a system is needed that enables more sophisticated schedule management and personalized decision-making support for users.

[0114] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0115] In this invention, the server includes means for receiving instructions using speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for efficiently managing tasks for devices that support daily life through voice input. This makes it easier for users to manage complex schedules and to receive advanced daily life support and decision-making support through individualized life support devices.

[0116] "Speech recognition" is a technology that analyzes speech and converts it into text data.

[0117] "Instructions" are requests or commands provided by the user, either verbally or in writing.

[0118] "Text data" refers to text information converted by speech recognition.

[0119] "Analysis" is the process of understanding textual data and extracting necessary information from it.

[0120] "Schedules" refer to scheduling information about future events or tasks.

[0121] A "task" is a task or activity set by the user.

[0122] "Synchronization" is the process of matching information across multiple devices.

[0123] A "notification" is a warning or message sent to inform a user about schedules and tasks.

[0124] "Information" refers to general data, including past user actions and external data.

[0125] "Advice" refers to suggestions or recommendations offered to support decision-making.

[0126] "Learning" is the process of analyzing trends based on the user's behavior history and deepening one's understanding of them.

[0127] A "suggestion" is the act of providing users with actions or options to facilitate optimization.

[0128] "Equipment" refers to hardware or devices in general, and is a device intended to support users.

[0129] "Daily life support" refers to functions and technologies designed to support users' daily activities.

[0130] The system implementing this invention is based on a program that integrates speech recognition and natural language processing. The server utilizes Google® Cloud Speech-to-Text API, a leading speech recognition engine, to convert user speech into text using speech recognition technology. Next, the text data is analyzed through the Google Cloud Natural Language API to understand the user's intent and determine the necessary actions.

[0131] The device uses cloud-based synchronization to update analyzed schedule information and tasks across various devices via the Google Calendar API. This allows users to view the latest appointments and reminders across all their devices.

[0132] Furthermore, the server leverages historical data and external information to generate recommendations that support decision-making. In this process, big data analytics techniques are used to analyze user behavior patterns and provide more personalized advice.

[0133] For example, if a user says, "Schedule a meeting for tomorrow afternoon," the server processes the voice and automatically registers the meeting in Google Calendar. In addition, it sends the user a suitable notification so they can prepare before the meeting. Furthermore, based on this, helpful suggestions are provided for future meetings, making the user's task management even more efficient.

[0134] An example of a prompt message to a generative AI model is, "Please create a forecast schedule for next week and provide an optimized suggestion." This instruction allows the system to automatically present the optimal schedule.

[0135] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0136] Step 1:

[0137] The server receives voice input from the user. This voice data is converted into text data using the Google Cloud Speech-to-Text API. The input is voice data, and the output is text data.

[0138] Step 2:

[0139] The server analyzes the generated text data using the Google Cloud Natural Language API. This analysis extracts the user's intent and recognizes specific plans and tasks. The input to this analysis is text data, and the output is the user's intent and planned information.

[0140] Step 3:

[0141] The server registers schedules and tasks with the Google Calendar API based on the analysis results. At this stage, cloud synchronization is used to allow users to check the latest schedule across various devices. The input is the analyzed schedule information, and the output is the schedule synchronized across devices.

[0142] Step 4:

[0143] The server analyzes historical user data and external information. Using big data analytics techniques, it detects user behavior patterns and generates recommendations to support future decision-making. The input is historical behavior and external data, and the output is personalized recommendations.

[0144] Step 5:

[0145] The server uses a generative AI model to prompt the user to generate an optimal schedule for the following weeks. Specifically, it uses the prompt "Draft a predicted schedule for next week and provide an optimized suggestion" to enhance the task. The input is the user's past data and the prompt, and the output is an optimized schedule.

[0146] Step 6:

[0147] The device receives notifications from the server and reports necessary information to the user in a timely manner. This includes setting reminders and notifying users of new recommendations. The input is schedule information provided by the server, and the output is notifications to the user.

[0148] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0149] This invention implements an AI secretary system that integrates an emotion engine to enrich user interaction and effectively support business operations. This system operates by combining speech recognition, natural language processing, emotion analysis, data synchronization, and decision support. Specific embodiments are described below.

[0150] Speech recognition and emotion analysis

[0151] The user gives instructions to the system via voice input. The server uses a speech recognition engine to convert the voice data into text data. At the same time, an emotion engine analyzes the voice data to identify the user's emotional state. For example, if the user speaks with a tired voice, the emotion engine will detect "fatigue."

[0152] Instruction analysis and emotion-based processing

[0153] The server analyzes the converted text data using a natural language processing engine to understand the user's intent. In addition, it dynamically adjusts task priorities based on emotional information identified by the emotion engine. For example, if the user is feeling stressed, it prioritizes suggesting relaxing leisure events.

[0154] Schedule management and multi-device synchronization

[0155] Using the analyzed data, the server automatically registers schedules. The data is then managed in the cloud and synchronized across multiple devices via the terminal. This allows users to instantly access information from any device.

[0156] Decision support and dynamic interfaces

[0157] The server analyzes historical data and external information, taking emotional information into consideration, to provide recommendations that support optimal decision-making. Furthermore, the device dynamically changes the interface design and feedback content according to the user's emotions, resulting in a more user-friendly experience.

[0158] Security and Privacy Management

[0159] Data obtained through sentiment analysis is encrypted and used only with the user's consent. This ensures strict protection of user privacy.

[0160] Specific example

[0161] For example, if a user says, "I'm dreading tomorrow's meeting," the server will add meeting preparation tasks to the schedule, while the emotion engine will identify the emotion of "dread." As a result, the server will offer suggestions to help the user relax (such as suggesting delegable tasks or setting a break time after the meeting) to alleviate the user's psychological burden.

[0162] By integrating an emotion engine in this way, this system can provide more personalized and effective business support.

[0163] The following describes the processing flow.

[0164] Step 1:

[0165] The user gives instructions to the system via a voice input device, using phrases such as, "Tell me about this week's project meeting."

[0166] Step 2:

[0167] The server activates the speech recognition engine and converts the user's voice data into text data. At this stage, the audio signal is analyzed.

[0168] Step 3:

[0169] The server simultaneously activates an emotion analysis engine to detect the user's emotions from the tone and tempo of their voice. For example, if it detects anxiety, it records that information.

[0170] Step 4:

[0171] The server analyzes the converted text data using a natural language processing engine to identify user needs. It grasps specific requests such as, "I want to check the schedule and progress of project meetings."

[0172] Step 5:

[0173] Based on the analysis results, the server retrieves relevant meeting information from calendar data in the cloud and sets reminders as needed.

[0174] Step 6:

[0175] The terminal notifies the user of meeting information retrieved from the server. A clean and intuitive interface displays information such as the date, time, location, and participants of the meeting.

[0176] Step 7:

[0177] Based on the emotion analysis results, the server generates suggestions to alleviate the user's stress. If significant anxiety is detected, it will suggest relaxing activities after the meeting.

[0178] Step 8:

[0179] The server encrypts all voice and emotional data and manages it to ensure privacy is respected. Emotional data will not be used for any other purpose without the user's consent.

[0180] These steps allow users to receive comprehensive business support, including sentiment analysis.

[0181] (Example 2)

[0182] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0183] In modern society, the presence of multiple electronic devices and vast amounts of data makes efficient information management and decision-making difficult. Furthermore, scheduling and information provision that disregards emotions can degrade the user experience. Therefore, there is a need for systems that understand emotions and provide optimal support to users.

[0184] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0185] In this invention, the server includes a device that reads acoustic signals and converts those signals into text information, a device that analyzes the text information and automatically records schedules and tasks, and a device that analyzes emotional information and dynamically adjusts the processing priorities and methods. This enables efficient information management and decision-making support that takes into account the user's emotions.

[0186] "Acoustic signals" refer to waveform data and information obtained from speech or sound.

[0187] "Textual information" refers to digital text data obtained as a result of converting acoustic signals.

[0188] "Device" refers to a physical or virtual piece of equipment configured to perform a specific function or role.

[0189] "Analysis" refers to the process of processing specific data or information to understand its meaning and patterns.

[0190] "Schedule" refers to an activity or event that should be carried out at a specific time.

[0191] "Challenges" refer to problems that need to be solved or goals that need to be achieved.

[0192] "Electronic equipment" refers to devices that operate using electrical or electronic components.

[0193] "Time management information" refers to date and time information related to schedules and tasks.

[0194] "Notifications" refer to messages or alerts provided to inform users of important information or events.

[0195] "History" refers to a record of past actions or events.

[0196] "External knowledge" refers to general data and knowledge obtained from sources outside the system.

[0197] "Emotional information" refers to data related to the user's emotional state.

[0198] "Priority" refers to the order of importance or urgency when it comes to processing or dealing with tasks.

[0199] "Method" refers to the means or techniques adopted to achieve a specific goal.

[0200] "User interface" refers to the screens and methods of operation that allow a user to interact with a system.

[0201] This invention is an AI assistant system with an integrated emotion engine, which operates by combining speech recognition, natural language processing, sentiment analysis, data synchronization, and decision support. Specific embodiments are described below.

[0202] First, the user provides voice input through the microphone. The terminal captures this voice input and sends it to the server. The server converts the voice data into text data using speech recognition software (e.g., a speech recognition service). Simultaneously, an emotion engine (e.g., emotion analysis software) analyzes the user's emotions from the voice data and identifies specific emotional states. For example, if the user says, "I'm dreading today's meeting," the emotion engine will detect emotions such as "anxiety" and "stress."

[0203] Next, the server uses a natural language processing engine (e.g., a natural language analysis model) to analyze the text data. This allows the server to understand the user's instructions and requests and determine the appropriate action. Simultaneously, it adjusts processing priorities based on emotional information. For example, for a user experiencing stress, it might first suggest relaxing activities.

[0204] Subsequently, the server manages the schedule based on the analyzed data. This includes using a cloud storage service to synchronize schedule information across multiple devices. This allows users to access the latest information from any device.

[0205] Finally, the server uses historical data and external information to support decision-making. The recommendation engine generates and provides optimal suggestions to the user. The terminal adjusts the interface according to the user's emotions to improve usability.

[0206] For example, if a user asks, "What should I do this weekend?", the generative AI model might suggest, "How about watching a movie or going hiking?" An example of a prompt would be, "Please suggest relaxing activities based on the user's current mood."

[0207] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0208] Step 1:

[0209] The user provides voice input via the microphone. This serves as the initial input and is captured by the device as an audio signal. For example, the user might say, "Tell me what's on the schedule for tomorrow."

[0210] Step 2:

[0211] The terminal sends an audio signal to the server. The server uses a speech recognition engine to convert the audio signal into text information. This converted text information becomes the next input. Specifically, the process involves generating the text "Tell me what my schedule is for tomorrow" from the audio data.

[0212] Step 3:

[0213] The server analyzes textual information using an emotion engine to identify emotional states. The input is the converted textual information, and the output is the perceived emotion data. In this step, the server determines what emotional meaning the "question" has. For example, a specific action would be to perceive the emotion "interesting."

[0214] Step 4:

[0215] The server uses a natural language processing engine to analyze textual information and understand the user's intent. Input consists of text data and sentiment data, and output is an action based on the user's intent. Specifically, through intent analysis, it understands that the user wants to check their schedule for the next day.

[0216] Step 5:

[0217] The server retrieves information from the schedule database based on the analysis results. The input is filtering conditions based on the user's request, and the output is the relevant schedule information. Specifically, it retrieves appointments such as "tomorrow's meeting" from cloud services.

[0218] Step 6:

[0219] The server sends the acquired schedule information to the terminal, and the terminal synchronizes the information with multiple electronic devices. The input is the acquired schedule information, and the output is the synchronized data. Specifically, the schedule information is updated and displayed on smartphones and tablets.

[0220] Step 7:

[0221] The server uses an AI model to generate value-added recommendations based on sentiment data and relevant information. The input is existing data and sentiment analysis results, and the output is personalized suggestions. A specific example of its operation would be a suggestion like, "I recommend relaxing at a cafe after tomorrow's meeting."

[0222] Step 8:

[0223] The device adjusts the user interface and presents information in a design suited to the user. Input is data related to emotions and user requests, and output is a customized UI display. Specifically, the screen's color scheme and font size are adjusted according to the emotion.

[0224] (Application Example 2)

[0225] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0226] In today's information society, users own multiple devices, making schedule management and task prioritization across these devices increasingly complex. Furthermore, there is a growing demand for personalized support that takes into account the user's emotional state. However, conventional systems lack effective integration of sentiment analysis into decision support, failing to significantly improve the user experience.

[0227] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0228] In this invention, the server includes means for receiving instructions via speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for identifying the user's emotional state using sentiment analysis and dynamically adjusting task priorities according to that state. This enables not only schedule synchronization across multiple devices but also more personalized support based on the user's emotions.

[0229] "Speech recognition" is a technology that receives audio data and converts it into text data.

[0230] "Text data" refers to string information that is recognized and converted from audio data.

[0231] "Analysis" is the process of understanding the information contained in data and extracting specific meanings.

[0232] "Plans" refer to scheduled actions or events that are to be carried out in the future.

[0233] "Work" refers to an activity or process carried out with a specific purpose.

[0234] An "device" is a machine or instrument that has a specific function.

[0235] "Schedule synchronization" refers to maintaining information consistency by coordinating schedules across multiple devices.

[0236] "Notification" refers to informing the user of specified information via an alert.

[0237] "External information" refers to additional data obtained from outside the user's environment.

[0238] "Decision support" refers to the act of providing advice and recommendations to help make the best possible decisions.

[0239] "Behavioral history" refers to a record of the user's past actions.

[0240] "Operating environment" refers to the physical or virtual environment in which a user performs their work.

[0241] "Emotional analysis" is a technology that reads emotional nuances from voice and text to determine the user's emotional state.

[0242] A "household robot" is an autonomous or semi-autonomous machine designed to support individual daily life within the home.

[0243] In implementing this invention, the server utilizes speech recognition functionality to convert voice commands from the user into text data. General-purpose speech recognition software can be used for speech recognition. The converted text data is analyzed using natural language processing techniques to understand the user's intent. Based on the analyzed data, the user's schedule and tasks are automatically registered. This enables the user to manage their schedule efficiently.

[0244] Furthermore, the server utilizes an emotion analysis engine to determine the user's emotions from their voice. Based on the results of the emotion analysis, it dynamically adjusts task priorities and provides support tailored to the user's emotional state. This can utilize emotion analysis software such as IBM Watson®.

[0245] Furthermore, the home robot's terminal has interactive functions and communicates with the user through voice. The robot provides appropriate feedback and task management based on emotion analysis results, making the user's life more comfortable. Depending on the user's emotional state, the robot can perform intuitive interfaces and appropriate actions.

[0246] For example, if a user tells the robot, "I'm tired today," the server converts the audio into text and uses an emotion analysis engine to detect fatigue. It then reviews the user's schedule and suggests relaxing activities as needed. In this way, flexible support is provided based on the user's emotional state.

[0247] An example of a prompt for a generative AI model would be, "The user says they are tired. How can we reduce their burden?" This allows the AI to devise specific support measures and suggest them to the user.

[0248] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0249] Step 1:

[0250] The server receives voice input from the user and sends it to speech recognition software. The speech recognition software converts the voice data into text data. This results in the voice being obtained as text information.

[0251] Step 2:

[0252] The converted text data is passed from the server to a natural language processing engine, where the user's intent is analyzed. This analysis examines the relationships between verbs and nouns, identifying the actions and tasks the user desires. As a result, specific tasks to be performed are output.

[0253] Step 3:

[0254] The server provides voice data to an emotion analysis engine, which then analyzes the user's emotional state. The emotion analysis determines the emotion based on the tone of voice and word choice. For example, the emotional state might be expressed as "fatigue" or "stress."

[0255] Step 4:

[0256] The server associates the analyzed sentiment information with the task content and adjusts the priority. Based on this priority adjustment, it is determined whether the task should be executed immediately or postponed. A list of top-priority items is output.

[0257] Step 5:

[0258] The device provides clear feedback to the user based on priority. The home robot suggests recommended tasks and actions for relaxation to the user. Specific actions include voice suggestions and visual notifications via the display.

[0259] Step 6:

[0260] The robot receives additional instructions or new voice input from the user, and the server generates a new prompt accordingly, adding support measures as needed. This process continuously complements user support. This final interaction is then integrated as the final result of the process.

[0261] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0262] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0263] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0264] [Second Embodiment]

[0265] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0266] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0267] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0268] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0269] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0270] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0271] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0272] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0273] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0274] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0275] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0276] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0277] In implementing the present invention, a system is provided in which the central component, an AI agent, functions as a voice recognition, data processing, and a user interface. Specific embodiments of this system are described below.

[0278] Speech recognition and instruction analysis

[0279] The user can input voice instructions to the system in a conversation format. The server uses voice recognition technology to convert the input voice into text and analyzes the user's instructions and intentions by a natural language processing engine. Through this analysis, the tasks and schedules required by the user are automatically extracted.

[0280] Schedule management and synchronization

[0281] Based on the analysis results, the server registers the schedule on the cloud and sets reminders as needed. The terminal synchronizes these data through the cloud and can view the latest schedule on various devices such as desktops, tablets, and smartphones. The user can enjoy the convenience of easily managing their own schedule.

[0282] Support for decision-making and proposal of priorities

[0283] The server makes full use of big data analysis technology to process the user's past behavior patterns and external information. Based on the insights obtained from this, objective information and proposals for supporting the user's future decision-making are presented. For example, proposals are made to facilitate the next selection based on the options frequently selected by the user.

[0284] Provision of individualized work environments

[0285] This system learns the user's behavior history and preferences and proposes an optimized business process. As a result, the user's productivity is maximized and a customized work environment for each individual is constructed. The server can thus continue to provide more accurate services.

[0286] Security and multilingual support

[0287] In terms of security, the server encrypts all data and has features to manage access rights, thus protecting user data. Furthermore, this system supports multiple languages, enabling smooth operation even in international business environments.

[0288] As a concrete example, when a user plans a business trip, they can give a voice command such as "Prepare my business trip schedule for next week." The server then references past business trip data to create a schedule and automatically generates necessary documents and packing lists. This is automatically synchronized across multiple devices, allowing the user to efficiently complete their preparations before departure.

[0289] Thus, by utilizing the present invention, users can automate complex tasks and streamline their entire operations.

[0290] The following describes the processing flow.

[0291] Step 1:

[0292] The user gives instructions to the voice input device. They communicate their schedule by voice, such as "Schedule a project meeting for next Monday."

[0293] Step 2:

[0294] The server sends the received audio to the speech recognition engine. The engine converts the audio data into text data.

[0295] Step 3:

[0296] The server uses natural language processing to analyze text data to determine the date, time, and event details. For example, it extracts the date information "next Monday."

[0297] Step 4:

[0298] Based on the analyzed information, the server registers the event in the cloud-based calendar system.

[0299] Step 5:

[0300] The server sets the reminder and prepares to send a notification 30 minutes before the meeting.

[0301] Step 6:

[0302] The terminal synchronizes the latest schedule data from the cloud system so that information can be consistently displayed on multiple devices.

[0303] Step 7:

[0304] The server sends a notification to the user's terminal at the scheduled time according to the set reminder. By checking the notification, the user can participate in the schedule without forgetting.

[0305] Step 8:

[0306] When the user requests a proposal or decision, the server performs big data analysis and makes an optimal proposal based on past behavior and external information.

[0307] Step 9:

[0308] The server receives feedback from the user, updates the machine learning model, and makes future recommendations more accurate.

[0309] Through this series of processes, the system automates the user's schedule management and supports effective decision-making.

[0310] (Example 1)

[0311] Next, Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0312] In today's busy work environment, users demand systems that efficiently manage their schedules and automate tasks. However, conventional systems have suffered from low accuracy in recognizing voice commands and incomplete synchronization across multiple devices. Furthermore, they lacked features to support decision-making, making it difficult to provide a work environment tailored to individual user needs. In addition, insufficient multilingual support and security measures limited support for international business operations.

[0313] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0314] This invention includes a server that receives instructions via speech recognition technology and converts those instructions into text data, a means for analyzing the text data and automatically registering schedule management data and tasks, and a means for synchronizing schedule management across multiple information terminals and providing notification functions. As a result, users can streamline schedule management through highly accurate speech recognition and check the latest information across multiple devices. Furthermore, by utilizing past usage data, users can receive suggestions to support decision-making, thereby optimizing and streamlining operations. In addition, multilingual functionality and data encryption enable support for international business and high security.

[0315] "Voice recognition technology" is a technology that converts a user's voice into a digital signal and generates text data or instructions by analyzing its content.

[0316] "Character data" refers to digital data used for information processing and data communication, based on strings of characters obtained through speech recognition or manual input.

[0317] "Schedule management data" refers to data used to electronically record and manage users' schedules and task information.

[0318] An "information terminal" refers to a device used to manipulate digital data, such as a computer, smartphone, or tablet.

[0319] "Schedule management" refers to activities and methods for organizing and managing users' schedules.

[0320] A "notification function" is a feature that provides users with timely information or warnings that have been set in advance.

[0321] "Multilingual functionality" refers to the ability of a system or service to use and switch between multiple languages simultaneously.

[0322] "Data encryption" is the process of transforming data using cryptographic techniques to protect it from unauthorized access.

[0323] "Access rights management" refers to a management method that controls access rights to specific information or functions, ensuring that only authorized users can access them.

[0324] The present invention aims to provide a system that allows users to efficiently manage their schedules using voice commands. This system operates by integrating voice recognition technology, natural language processing technology, data synchronization technology, big data analysis technology, multilingual support, and security technology.

[0325] The server uses common speech recognition technology to convert the user's voice instructions into text data. Specifically, this can be done using a speech recognition API. The converted text data is then analyzed by a natural language processing engine (e.g., a generative AI model) to understand the user's intent and instructions. This analysis automatically generates schedule management data and tasks.

[0326] The analyzed data is registered on a cloud platform for schedule management. This registration can be done using a cloud service API. The device synchronizes this data via the cloud, allowing users to access it from multiple devices such as PCs, tablets, and smartphones. This makes it possible to check the latest schedule from any device.

[0327] Furthermore, the server can use big data analytics technology to process users' past behavioral data and external information. This processing makes it possible to provide suggestions to support decision-making. For example, based on past business trip history, it can automatically generate suggestions and packing lists for the next business trip.

[0328] Multilingual support enables smooth operation even in international environments. Data encryption technology and access control ensure the secure protection of user data.

[0329] For example, if a user voice-inputs "Prepare my business trip schedule for next week," the server uses speech recognition technology to convert the input into text, analyzes it using natural language processing, and then automatically creates a business trip schedule, generating a list of necessary documents and items to pack. This information is synchronized across multiple devices, allowing the user to prepare efficiently.

[0330] Example prompt: "The user has given the voice command 'Prepare my business trip schedule for next week.' Start the process of using past business trip data to create a schedule, generate a packing list, and sync it to the device."

[0331] In this way, this system can automate and streamline users' work while simultaneously providing a secure and multilingual system environment.

[0332] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0333] Step 1:

[0334] The user gives instructions to the system via voice input. Specifically, the user might say into the microphone, "Prepare my travel schedule for next week." This voice input becomes the input data for the system.

[0335] Step 2:

[0336] The server uses speech recognition technology to convert the input speech into text data. It analyzes the speech data and generates a string by mapping each phoneme. The output of this process is the user's instruction in text form. The resulting text data is "Prepare the schedule for next week's business trip."

[0337] Step 3:

[0338] The server passes the generated text data to a natural language processing engine to analyze the user's intent. A generative AI model is used to syntactically analyze the text and extract the user's instructions. The output of the analysis consists of specific action items necessary for schedule registration and task creation. Examples include "Create a business trip schedule," "Generate a document list," and "Create a packing checklist."

[0339] Step 4:

[0340] The server operates the schedule management application based on the analyzed data and registers the appointment on the cloud. Specifically, it uses an API to create a new date and registers related information. As an output, the new business trip appointment is added to the cloud calendar.

[0341] Step 5:

[0342] The device retrieves new schedule information registered in the cloud. By synchronizing data from the cloud, it becomes possible to display the same schedule on multiple devices such as smartphones and PCs. As output, the latest schedule is updated and displayed on the device.

[0343] Step 6:

[0344] The server uses big data analytics to investigate past travel data and related information, generating suggestions to support user decision-making. Prompt messages are used to gain insights from the generated AI model. The output includes an optimal packing list and important points for the user.

[0345] Step 7:

[0346] Users can review updated schedules and suggestions from the server, and make modifications as needed. By utilizing the suggestions provided via smart devices, more efficient schedule management is possible. The output is a finalized, optimized schedule.

[0347] (Application Example 1)

[0348] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0349] In modern life, individual users' schedules and tasks are frequently updated and demanding, making efficient management difficult. Furthermore, while there is a growing need for devices that support personal daily life to provide more advanced assistance, the means to achieve this are not yet sufficiently available. Therefore, a system is needed that enables more sophisticated schedule management and personalized decision-making support for users.

[0350] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0351] In this invention, the server includes means for receiving instructions using speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for efficiently managing tasks for devices that support daily life through voice input. This makes it easier for users to manage complex schedules and to receive advanced daily life support and decision-making support through individualized life support devices.

[0352] "Speech recognition" is a technology that analyzes speech and converts it into text data.

[0353] "Instructions" are requests or commands provided by the user, either verbally or in writing.

[0354] "Text data" refers to text information converted by speech recognition.

[0355] "Analysis" is the process of understanding textual data and extracting necessary information from it.

[0356] "Schedules" refer to scheduling information about future events or tasks.

[0357] A "task" is a task or activity set by the user.

[0358] "Synchronization" is the process of matching information across multiple devices.

[0359] A "notification" is a warning or message sent to inform a user about schedules and tasks.

[0360] "Information" refers to general data, including past user actions and external data.

[0361] "Advice" refers to suggestions or recommendations offered to support decision-making.

[0362] "Learning" is the process of analyzing trends based on the user's behavior history and deepening one's understanding of them.

[0363] A "suggestion" is the act of providing users with actions or options to facilitate optimization.

[0364] "Equipment" refers to hardware or devices in general, and is a device intended to support users.

[0365] "Daily life support" refers to functions and technologies designed to support users' daily activities.

[0366] The system implementing this invention is based on a program that integrates speech recognition and natural language processing. The server utilizes the Google Cloud Speech-to-Text API, a leading speech recognition engine, to convert user speech into text using speech recognition technology. Next, the text data is analyzed through the Google Cloud Natural Language API to understand the user's intent and determine the necessary actions.

[0367] The device uses cloud-based synchronization to update analyzed schedule information and tasks across various devices via the Google Calendar API. This allows users to view the latest appointments and reminders across all their devices.

[0368] Furthermore, the server leverages historical data and external information to generate recommendations that support decision-making. In this process, big data analytics techniques are used to analyze user behavior patterns and provide more personalized advice.

[0369] For example, if a user says, "Schedule a meeting for tomorrow afternoon," the server processes the voice and automatically registers the meeting in Google Calendar. In addition, it sends the user a suitable notification so they can prepare before the meeting. Furthermore, based on this, helpful suggestions are provided for future meetings, making the user's task management even more efficient.

[0370] An example of a prompt message to a generative AI model is, "Please create a forecast schedule for next week and provide an optimized suggestion." This instruction allows the system to automatically present the optimal schedule.

[0371] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0372] Step 1:

[0373] The server receives voice input from the user. This voice data is converted into text data using the Google Cloud Speech-to-Text API. The input is voice data, and the output is text data.

[0374] Step 2:

[0375] The server analyzes the generated text data using the Google Cloud Natural Language API. This analysis extracts the user's intent and recognizes specific plans and tasks. The input to this analysis is text data, and the output is the user's intent and planned information.

[0376] Step 3:

[0377] The server registers schedules and tasks with the Google Calendar API based on the analysis results. At this stage, cloud synchronization is used to allow users to check the latest schedule across various devices. The input is the analyzed schedule information, and the output is the schedule synchronized across devices.

[0378] Step 4:

[0379] The server analyzes historical user data and external information. Using big data analytics techniques, it detects user behavior patterns and generates recommendations to support future decision-making. The input is historical behavior and external data, and the output is personalized recommendations.

[0380] Step 5:

[0381] The server uses a generative AI model to prompt the user to generate an optimal schedule for the following weeks. Specifically, it uses the prompt "Draft a predicted schedule for next week and provide an optimized suggestion" to enhance the task. The input is the user's past data and the prompt, and the output is an optimized schedule.

[0382] Step 6:

[0383] The device receives notifications from the server and reports necessary information to the user in a timely manner. This includes setting reminders and notifying users of new recommendations. The input is schedule information provided by the server, and the output is notifications to the user.

[0384] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0385] This invention implements an AI secretary system that integrates an emotion engine to enrich user interaction and effectively support business operations. This system operates by combining speech recognition, natural language processing, emotion analysis, data synchronization, and decision support. Specific embodiments are described below.

[0386] Speech recognition and emotion analysis

[0387] The user gives instructions to the system via voice input. The server uses a speech recognition engine to convert the voice data into text data. At the same time, an emotion engine analyzes the voice data to identify the user's emotional state. For example, if the user speaks with a tired voice, the emotion engine will detect "fatigue."

[0388] Instruction analysis and emotion-based processing

[0389] The server analyzes the converted text data using a natural language processing engine to understand the user's intent. In addition, it dynamically adjusts task priorities based on emotional information identified by the emotion engine. For example, if the user is feeling stressed, it prioritizes suggesting relaxing leisure events.

[0390] Schedule management and multi-device synchronization

[0391] Using the analyzed data, the server automatically registers schedules. The data is then managed in the cloud and synchronized across multiple devices via the terminal. This allows users to instantly access information from any device.

[0392] Decision support and dynamic interfaces

[0393] The server analyzes historical data and external information, taking emotional information into consideration, to provide recommendations that support optimal decision-making. Furthermore, the device dynamically changes the interface design and feedback content according to the user's emotions, resulting in a more user-friendly experience.

[0394] Security and Privacy Management

[0395] Data obtained through sentiment analysis is encrypted and used only with the user's consent. This ensures strict protection of user privacy.

[0396] Specific example

[0397] For example, if a user says, "I'm dreading tomorrow's meeting," the server will add meeting preparation tasks to the schedule, while the emotion engine will identify the emotion of "dread." As a result, the server will offer suggestions to help the user relax (such as suggesting delegable tasks or setting a break time after the meeting) to alleviate the user's psychological burden.

[0398] By integrating an emotion engine in this way, this system can provide more personalized and effective business support.

[0399] The following describes the processing flow.

[0400] Step 1:

[0401] The user gives instructions to the system via a voice input device, using phrases such as, "Tell me about this week's project meeting."

[0402] Step 2:

[0403] The server activates the speech recognition engine and converts the user's voice data into text data. At this stage, the audio signal is analyzed.

[0404] Step 3:

[0405] The server simultaneously activates an emotion analysis engine to detect the user's emotions from the tone and tempo of their voice. For example, if it detects anxiety, it records that information.

[0406] Step 4:

[0407] The server analyzes the converted text data using a natural language processing engine to identify user needs. It grasps specific requests such as, "I want to check the schedule and progress of project meetings."

[0408] Step 5:

[0409] Based on the analysis results, the server retrieves relevant meeting information from calendar data in the cloud and sets reminders as needed.

[0410] Step 6:

[0411] The terminal notifies the user of meeting information retrieved from the server. A clean and intuitive interface displays information such as the date, time, location, and participants of the meeting.

[0412] Step 7:

[0413] Based on the emotion analysis results, the server generates suggestions to alleviate the user's stress. If significant anxiety is detected, it will suggest relaxing activities after the meeting.

[0414] Step 8:

[0415] The server encrypts all voice and emotional data and manages it to ensure privacy is respected. Emotional data will not be used for any other purpose without the user's consent.

[0416] These steps allow users to receive comprehensive business support, including sentiment analysis.

[0417] (Example 2)

[0418] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0419] In modern society, the presence of multiple electronic devices and vast amounts of data makes efficient information management and decision-making difficult. Furthermore, scheduling and information provision that disregards emotions can degrade the user experience. Therefore, there is a need for systems that understand emotions and provide optimal support to users.

[0420] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0421] In this invention, the server includes a device that reads acoustic signals and converts those signals into text information, a device that analyzes the text information and automatically records schedules and tasks, and a device that analyzes emotional information and dynamically adjusts the processing priorities and methods. This enables efficient information management and decision-making support that takes into account the user's emotions.

[0422] "Acoustic signals" refer to waveform data and information obtained from speech or sound.

[0423] "Textual information" refers to digital text data obtained as a result of converting acoustic signals.

[0424] "Device" refers to a physical or virtual piece of equipment configured to perform a specific function or role.

[0425] "Analysis" refers to the process of processing specific data or information to understand its meaning and patterns.

[0426] "Schedule" refers to an activity or event that should be carried out at a specific time.

[0427] "Challenges" refer to problems that need to be solved or goals that need to be achieved.

[0428] "Electronic equipment" refers to devices that operate using electrical or electronic components.

[0429] "Time management information" refers to date and time information related to schedules and tasks.

[0430] "Notifications" refer to messages or alerts provided to inform users of important information or events.

[0431] "History" refers to a record of past actions or events.

[0432] "External knowledge" refers to general data and knowledge obtained from sources outside the system.

[0433] "Emotional information" refers to data related to the user's emotional state.

[0434] "Priority" refers to the order of importance or urgency when it comes to processing or dealing with tasks.

[0435] "Method" refers to the means or techniques adopted to achieve a specific goal.

[0436] "User interface" refers to the screens and methods of operation that allow a user to interact with a system.

[0437] This invention is an AI assistant system with an integrated emotion engine, which operates by combining speech recognition, natural language processing, sentiment analysis, data synchronization, and decision support. Specific embodiments are described below.

[0438] First, the user provides voice input through the microphone. The terminal captures this voice input and sends it to the server. The server converts the voice data into text data using speech recognition software (e.g., a speech recognition service). Simultaneously, an emotion engine (e.g., emotion analysis software) analyzes the user's emotions from the voice data and identifies specific emotional states. For example, if the user says, "I'm dreading today's meeting," the emotion engine will detect emotions such as "anxiety" and "stress."

[0439] Next, the server uses a natural language processing engine (e.g., a natural language analysis model) to analyze the text data. This allows the server to understand the user's instructions and requests and determine the appropriate action. Simultaneously, it adjusts processing priorities based on emotional information. For example, for a user experiencing stress, it might first suggest relaxing activities.

[0440] Subsequently, the server manages the schedule based on the analyzed data. This includes using a cloud storage service to synchronize schedule information across multiple devices. This allows users to access the latest information from any device.

[0441] Finally, the server uses historical data and external information to support decision-making. The recommendation engine generates and provides optimal suggestions to the user. The terminal adjusts the interface according to the user's emotions to improve usability.

[0442] For example, if a user asks, "What should I do this weekend?", the generative AI model might suggest, "How about watching a movie or going hiking?" An example of a prompt would be, "Please suggest relaxing activities based on the user's current mood."

[0443] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0444] Step 1:

[0445] The user provides voice input via the microphone. This serves as the initial input and is captured by the device as an audio signal. For example, the user might say, "Tell me what's on the schedule for tomorrow."

[0446] Step 2:

[0447] The terminal sends an audio signal to the server. The server uses a speech recognition engine to convert the audio signal into text information. This converted text information becomes the next input. Specifically, the process involves generating the text "Tell me what my schedule is for tomorrow" from the audio data.

[0448] Step 3:

[0449] The server analyzes textual information using an emotion engine to identify emotional states. The input is the converted textual information, and the output is the perceived emotion data. In this step, the server determines what emotional meaning the "question" has. For example, a specific action would be to perceive the emotion "interesting."

[0450] Step 4:

[0451] The server uses a natural language processing engine to analyze textual information and understand the user's intent. Input consists of text data and sentiment data, and output is an action based on the user's intent. Specifically, through intent analysis, it understands that the user wants to check their schedule for the next day.

[0452] Step 5:

[0453] The server retrieves information from the schedule database based on the analysis results. The input is filtering conditions based on the user's request, and the output is the relevant schedule information. Specifically, it retrieves appointments such as "tomorrow's meeting" from cloud services.

[0454] Step 6:

[0455] The server sends the acquired schedule information to the terminal, and the terminal synchronizes the information with multiple electronic devices. The input is the acquired schedule information, and the output is the synchronized data. Specifically, the schedule information is updated and displayed on smartphones and tablets.

[0456] Step 7:

[0457] The server uses an AI model to generate value-added recommendations based on sentiment data and relevant information. The input is existing data and sentiment analysis results, and the output is personalized suggestions. A specific example of its operation would be a suggestion like, "I recommend relaxing at a cafe after tomorrow's meeting."

[0458] Step 8:

[0459] The device adjusts the user interface and presents information in a design suited to the user. Input is data related to emotions and user requests, and output is a customized UI display. Specifically, the screen's color scheme and font size are adjusted according to the emotion.

[0460] (Application Example 2)

[0461] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0462] In today's information society, users own multiple devices, making schedule management and task prioritization across these devices increasingly complex. Furthermore, there is a growing demand for personalized support that takes into account the user's emotional state. However, conventional systems lack effective integration of sentiment analysis into decision support, failing to significantly improve the user experience.

[0463] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0464] In this invention, the server includes means for receiving instructions via speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for identifying the user's emotional state using sentiment analysis and dynamically adjusting task priorities according to that state. This enables not only schedule synchronization across multiple devices but also more personalized support based on the user's emotions.

[0465] "Speech recognition" is a technology that receives audio data and converts it into text data.

[0466] "Text data" refers to string information that is recognized and converted from audio data.

[0467] "Analysis" is the process of understanding the information contained in data and extracting specific meanings.

[0468] "Plans" refer to scheduled actions or events that are to be carried out in the future.

[0469] "Work" refers to an activity or process carried out with a specific purpose.

[0470] An "device" is a machine or instrument that has a specific function.

[0471] "Schedule synchronization" refers to maintaining information consistency by coordinating schedules across multiple devices.

[0472] "Notification" refers to informing the user of specified information via an alert.

[0473] "External information" refers to additional data obtained from outside the user's environment.

[0474] "Decision support" refers to the act of providing advice and recommendations to help make the best possible decisions.

[0475] "Behavioral history" refers to a record of the user's past actions.

[0476] "Operating environment" refers to the physical or virtual environment in which a user performs their work.

[0477] "Emotional analysis" is a technology that reads emotional nuances from voice and text to determine the user's emotional state.

[0478] A "household robot" is an autonomous or semi-autonomous machine designed to support individual daily life within the home.

[0479] In implementing this invention, the server utilizes speech recognition functionality to convert voice commands from the user into text data. General-purpose speech recognition software can be used for speech recognition. The converted text data is analyzed using natural language processing techniques to understand the user's intent. Based on the analyzed data, the user's schedule and tasks are automatically registered. This enables the user to manage their schedule efficiently.

[0480] Furthermore, the server utilizes an emotion analysis engine to determine the user's emotions from their voice. Based on the results of the emotion analysis, it dynamically adjusts task priorities and provides support tailored to the user's emotional state. This can be achieved using emotion analysis software such as IBM Watson.

[0481] Furthermore, the home robot's terminal has interactive functions and communicates with the user through voice. The robot provides appropriate feedback and task management based on emotion analysis results, making the user's life more comfortable. Depending on the user's emotional state, the robot can perform intuitive interfaces and appropriate actions.

[0482] For example, if a user tells the robot, "I'm tired today," the server converts the audio into text and uses an emotion analysis engine to detect fatigue. It then reviews the user's schedule and suggests relaxing activities as needed. In this way, flexible support is provided based on the user's emotional state.

[0483] An example of a prompt for a generative AI model would be, "The user says they are tired. How can we reduce their burden?" This allows the AI to devise specific support measures and suggest them to the user.

[0484] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0485] Step 1:

[0486] The server receives voice input from the user and sends it to speech recognition software. The speech recognition software converts the voice data into text data. This results in the voice being obtained as text information.

[0487] Step 2:

[0488] The converted text data is passed from the server to a natural language processing engine, where the user's intent is analyzed. This analysis examines the relationships between verbs and nouns, identifying the actions and tasks the user desires. As a result, specific tasks to be performed are output.

[0489] Step 3:

[0490] The server provides voice data to an emotion analysis engine, which then analyzes the user's emotional state. The emotion analysis determines the emotion based on the tone of voice and word choice. For example, the emotional state might be expressed as "fatigue" or "stress."

[0491] Step 4:

[0492] The server associates the analyzed sentiment information with the task content and adjusts the priority. Based on this priority adjustment, it is determined whether the task should be executed immediately or postponed. A list of top-priority items is output.

[0493] Step 5:

[0494] The device provides clear feedback to the user based on priority. The home robot suggests recommended tasks and actions for relaxation to the user. Specific actions include voice suggestions and visual notifications via the display.

[0495] Step 6:

[0496] The robot receives additional instructions or new voice input from the user, and the server generates a new prompt accordingly, adding support measures as needed. This process continuously complements user support. This final interaction is then integrated as the final result of the process.

[0497] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0498] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0499] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0500] [Third Embodiment]

[0501] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0502] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0503] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0504] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0505] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0506] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0507] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0508] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0509] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0510] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0511] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0512] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0513] In implementing the present invention, a system is provided in which the central component, an AI agent, functions as a voice recognition, data processing, and a user interface. Specific embodiments of this system are described below.

[0514] Speech recognition and instruction analysis

[0515] Users can input voice instructions into the system in a conversational format. The server uses speech recognition technology to convert the input voice into text, and a natural language processing engine analyzes the user's instructions and intentions. This analysis automatically extracts the tasks and schedules the user requests.

[0516] Schedule management and synchronization

[0517] Based on the analysis results, the server registers the schedule in the cloud and sets reminders as needed. Devices synchronize this data via the cloud, allowing users to view their latest schedules on various devices such as desktops, tablets, and smartphones. Users can enjoy the convenience of easily managing their own schedules.

[0518] Support for decision-making and suggestion of priorities

[0519] The server utilizes big data analytics technology to process users' past behavioral patterns and external information. Based on the insights gained, it presents users with objective information and suggestions to support their future decision-making. For example, it makes suggestions to make future choices easier based on the options the user frequently selects.

[0520] Providing an individualized work environment

[0521] This system learns the user's behavioral history and preferences to propose an optimized workflow. This maximizes user productivity and creates a personalized work environment. The server, in turn, can continuously provide a more accurate service.

[0522] Security and multilingual support

[0523] In terms of security, the server encrypts all data and has features to manage access rights, thus protecting user data. Furthermore, this system supports multiple languages, enabling smooth operation even in international business environments.

[0524] As a concrete example, when a user plans a business trip, they can give a voice command such as "Prepare my business trip schedule for next week." The server then references past business trip data to create a schedule and automatically generates necessary documents and packing lists. This is automatically synchronized across multiple devices, allowing the user to efficiently complete their preparations before departure.

[0525] Thus, by utilizing the present invention, users can automate complex tasks and streamline their entire operations.

[0526] The following describes the processing flow.

[0527] Step 1:

[0528] The user gives instructions to the voice input device. They communicate their schedule by voice, such as "Schedule a project meeting for next Monday."

[0529] Step 2:

[0530] The server sends the received audio to the speech recognition engine. The engine converts the audio data into text data.

[0531] Step 3:

[0532] The server uses natural language processing to analyze text data to determine the date, time, and event details. For example, it extracts the date information "next Monday."

[0533] Step 4:

[0534] Based on the analyzed information, the server registers the event in the cloud-based calendar system.

[0535] Step 5:

[0536] The server sets up a reminder and prepares to notify you 30 minutes before the meeting.

[0537] Step 6:

[0538] The terminal synchronizes the latest schedule data from the cloud system, ensuring that information is displayed consistently across multiple devices.

[0539] Step 7:

[0540] The server sends a notification to the user's device at the scheduled time according to the set reminder. By checking this notification, the user can ensure they don't forget to participate in the scheduled event.

[0541] Step 8:

[0542] When a user requests a suggestion or decision, the server performs big data analysis and provides the best possible suggestion based on past behavior and external information.

[0543] Step 9:

[0544] The server receives user feedback, updates its machine learning model, and makes future recommendations more accurate.

[0545] Through this series of processes, the system automates user schedule management and supports effective decision-making.

[0546] (Example 1)

[0547] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0548] In today's busy work environment, users demand systems that efficiently manage their schedules and automate tasks. However, conventional systems have suffered from low accuracy in recognizing voice commands and incomplete synchronization across multiple devices. Furthermore, they lacked features to support decision-making, making it difficult to provide a work environment tailored to individual user needs. In addition, insufficient multilingual support and security measures limited support for international business operations.

[0549] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0550] This invention includes a server that receives instructions via speech recognition technology and converts those instructions into text data, a means for analyzing the text data and automatically registering schedule management data and tasks, and a means for synchronizing schedule management across multiple information terminals and providing notification functions. As a result, users can streamline schedule management through highly accurate speech recognition and check the latest information across multiple devices. Furthermore, by utilizing past usage data, users can receive suggestions to support decision-making, thereby optimizing and streamlining operations. In addition, multilingual functionality and data encryption enable support for international business and high security.

[0551] "Voice recognition technology" is a technology that converts a user's voice into a digital signal and generates text data or instructions by analyzing its content.

[0552] "Character data" refers to digital data used for information processing and data communication, based on strings of characters obtained through speech recognition or manual input.

[0553] "Schedule management data" refers to data used to electronically record and manage users' schedules and task information.

[0554] An "information terminal" refers to a device used to manipulate digital data, such as a computer, smartphone, or tablet.

[0555] "Schedule management" refers to activities and methods for organizing and managing users' schedules.

[0556] A "notification function" is a feature that provides users with timely information or warnings that have been set in advance.

[0557] "Multilingual functionality" refers to the ability of a system or service to use and switch between multiple languages simultaneously.

[0558] "Data encryption" is the process of transforming data using cryptographic techniques to protect it from unauthorized access.

[0559] "Access rights management" refers to a management method that controls access rights to specific information or functions, ensuring that only authorized users can access them.

[0560] The present invention aims to provide a system that allows users to efficiently manage their schedules using voice commands. This system operates by integrating voice recognition technology, natural language processing technology, data synchronization technology, big data analysis technology, multilingual support, and security technology.

[0561] The server uses common speech recognition technology to convert the user's voice instructions into text data. Specifically, this can be done using a speech recognition API. The converted text data is then analyzed by a natural language processing engine (e.g., a generative AI model) to understand the user's intent and instructions. This analysis automatically generates schedule management data and tasks.

[0562] The analyzed data is registered on a cloud platform for schedule management. This registration can be done using a cloud service API. The device synchronizes this data via the cloud, allowing users to access it from multiple devices such as PCs, tablets, and smartphones. This makes it possible to check the latest schedule from any device.

[0563] Furthermore, the server can use big data analytics technology to process users' past behavioral data and external information. This processing makes it possible to provide suggestions to support decision-making. For example, based on past business trip history, it can automatically generate suggestions and packing lists for the next business trip.

[0564] Multilingual support enables smooth operation even in international environments. Data encryption technology and access control ensure the secure protection of user data.

[0565] For example, if a user voice-inputs "Prepare my business trip schedule for next week," the server uses speech recognition technology to convert the input into text, analyzes it using natural language processing, and then automatically creates a business trip schedule, generating a list of necessary documents and items to pack. This information is synchronized across multiple devices, allowing the user to prepare efficiently.

[0566] Example prompt: "The user has given the voice command 'Prepare my business trip schedule for next week.' Start the process of using past business trip data to create a schedule, generate a packing list, and sync it to the device."

[0567] In this way, this system can automate and streamline users' work while simultaneously providing a secure and multilingual system environment.

[0568] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0569] Step 1:

[0570] The user gives instructions to the system via voice input. Specifically, the user might say into the microphone, "Prepare my travel schedule for next week." This voice input becomes the input data for the system.

[0571] Step 2:

[0572] The server uses speech recognition technology to convert the input speech into text data. It analyzes the speech data and generates a string by mapping each phoneme. The output of this process is the user's instruction in text form. The resulting text data is "Prepare the schedule for next week's business trip."

[0573] Step 3:

[0574] The server passes the generated text data to a natural language processing engine to analyze the user's intent. A generative AI model is used to syntactically analyze the text and extract the user's instructions. The output of the analysis consists of specific action items necessary for schedule registration and task creation. Examples include "Create a business trip schedule," "Generate a document list," and "Create a packing checklist."

[0575] Step 4:

[0576] The server operates the schedule management application based on the analyzed data and registers the appointment on the cloud. Specifically, it uses an API to create a new date and registers related information. As an output, the new business trip appointment is added to the cloud calendar.

[0577] Step 5:

[0578] The device retrieves new schedule information registered in the cloud. By synchronizing data from the cloud, it becomes possible to display the same schedule on multiple devices such as smartphones and PCs. As output, the latest schedule is updated and displayed on the device.

[0579] Step 6:

[0580] The server uses big data analytics to investigate past travel data and related information, generating suggestions to support user decision-making. Prompt messages are used to gain insights from the generated AI model. The output includes an optimal packing list and important points for the user.

[0581] Step 7:

[0582] Users can review updated schedules and suggestions from the server, and make modifications as needed. By utilizing the suggestions provided via smart devices, more efficient schedule management is possible. The output is a finalized, optimized schedule.

[0583] (Application Example 1)

[0584] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0585] In modern life, individual users' schedules and tasks are frequently updated and demanding, making efficient management difficult. Furthermore, while there is a growing need for devices that support personal daily life to provide more advanced assistance, the means to achieve this are not yet sufficiently available. Therefore, a system is needed that enables more sophisticated schedule management and personalized decision-making support for users.

[0586] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0587] In this invention, the server includes means for receiving instructions using speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for efficiently managing tasks for devices that support daily life through voice input. This makes it easier for users to manage complex schedules and to receive advanced daily life support and decision-making support through individualized life support devices.

[0588] "Speech recognition" is a technology that analyzes speech and converts it into text data.

[0589] "Instructions" are requests or commands provided by the user, either verbally or in writing.

[0590] "Text data" refers to text information converted by speech recognition.

[0591] "Analysis" is the process of understanding textual data and extracting necessary information from it.

[0592] "Schedules" refer to scheduling information about future events or tasks.

[0593] A "task" is a task or activity set by the user.

[0594] "Synchronization" is the process of matching information across multiple devices.

[0595] A "notification" is a warning or message sent to inform a user about schedules and tasks.

[0596] "Information" refers to general data, including past user actions and external data.

[0597] "Advice" refers to suggestions or recommendations offered to support decision-making.

[0598] "Learning" is the process of analyzing trends based on the user's behavior history and deepening one's understanding of them.

[0599] A "suggestion" is the act of providing users with actions or options to facilitate optimization.

[0600] "Equipment" refers to hardware or devices in general, and is a device intended to support users.

[0601] "Daily life support" refers to functions and technologies designed to support users' daily activities.

[0602] The system implementing this invention is based on a program that integrates speech recognition and natural language processing. The server utilizes the Google Cloud Speech-to-Text API, a leading speech recognition engine, to convert user speech into text using speech recognition technology. Next, the text data is analyzed through the Google Cloud Natural Language API to understand the user's intent and determine the necessary actions.

[0603] The device uses cloud-based synchronization to update analyzed schedule information and tasks across various devices via the Google Calendar API. This allows users to view the latest appointments and reminders across all their devices.

[0604] Furthermore, the server leverages historical data and external information to generate recommendations that support decision-making. In this process, big data analytics techniques are used to analyze user behavior patterns and provide more personalized advice.

[0605] For example, if a user says, "Schedule a meeting for tomorrow afternoon," the server processes the voice and automatically registers the meeting in Google Calendar. In addition, it sends the user a suitable notification so they can prepare before the meeting. Furthermore, based on this, helpful suggestions are provided for future meetings, making the user's task management even more efficient.

[0606] An example of a prompt message to a generative AI model is, "Please create a forecast schedule for next week and provide an optimized suggestion." This instruction allows the system to automatically present the optimal schedule.

[0607] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0608] Step 1:

[0609] The server receives voice input from the user. This voice data is converted into text data using the Google Cloud Speech-to-Text API. The input is voice data, and the output is text data.

[0610] Step 2:

[0611] The server analyzes the generated text data using the Google Cloud Natural Language API. This analysis extracts the user's intent and recognizes specific plans and tasks. The input to this analysis is text data, and the output is the user's intent and planned information.

[0612] Step 3:

[0613] The server registers schedules and tasks with the Google Calendar API based on the analysis results. At this stage, cloud synchronization is used to allow users to check the latest schedule across various devices. The input is the analyzed schedule information, and the output is the schedule synchronized across devices.

[0614] Step 4:

[0615] The server analyzes historical user data and external information. Using big data analytics techniques, it detects user behavior patterns and generates recommendations to support future decision-making. The input is historical behavior and external data, and the output is personalized recommendations.

[0616] Step 5:

[0617] The server uses a generative AI model to prompt the user to generate an optimal schedule for the following weeks. Specifically, it uses the prompt "Draft a predicted schedule for next week and provide an optimized suggestion" to enhance the task. The input is the user's past data and the prompt, and the output is an optimized schedule.

[0618] Step 6:

[0619] The device receives notifications from the server and reports necessary information to the user in a timely manner. This includes setting reminders and notifying users of new recommendations. The input is schedule information provided by the server, and the output is notifications to the user.

[0620] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0621] This invention implements an AI secretary system that integrates an emotion engine to enrich user interaction and effectively support business operations. This system operates by combining speech recognition, natural language processing, emotion analysis, data synchronization, and decision support. Specific embodiments are described below.

[0622] Speech recognition and emotion analysis

[0623] The user gives instructions to the system via voice input. The server uses a speech recognition engine to convert the voice data into text data. At the same time, an emotion engine analyzes the voice data to identify the user's emotional state. For example, if the user speaks with a tired voice, the emotion engine will detect "fatigue."

[0624] Instruction analysis and emotion-based processing

[0625] The server analyzes the converted text data using a natural language processing engine to understand the user's intent. In addition, it dynamically adjusts task priorities based on emotional information identified by the emotion engine. For example, if the user is feeling stressed, it prioritizes suggesting relaxing leisure events.

[0626] Schedule management and multi-device synchronization

[0627] Using the analyzed data, the server automatically registers schedules. The data is then managed in the cloud and synchronized across multiple devices via the terminal. This allows users to instantly access information from any device.

[0628] Decision support and dynamic interfaces

[0629] The server analyzes historical data and external information, taking emotional information into consideration, to provide recommendations that support optimal decision-making. Furthermore, the device dynamically changes the interface design and feedback content according to the user's emotions, resulting in a more user-friendly experience.

[0630] Security and Privacy Management

[0631] Data obtained through sentiment analysis is encrypted and used only with the user's consent. This ensures strict protection of user privacy.

[0632] Specific example

[0633] For example, if a user says, "I'm dreading tomorrow's meeting," the server will add meeting preparation tasks to the schedule, while the emotion engine will identify the emotion of "dread." As a result, the server will offer suggestions to help the user relax (such as suggesting delegable tasks or setting a break time after the meeting) to alleviate the user's psychological burden.

[0634] By integrating an emotion engine in this way, this system can provide more personalized and effective business support.

[0635] The following describes the processing flow.

[0636] Step 1:

[0637] The user gives instructions to the system via a voice input device, using phrases such as, "Tell me about this week's project meeting."

[0638] Step 2:

[0639] The server activates the speech recognition engine and converts the user's voice data into text data. At this stage, the audio signal is analyzed.

[0640] Step 3:

[0641] The server simultaneously activates an emotion analysis engine to detect the user's emotions from the tone and tempo of their voice. For example, if it detects anxiety, it records that information.

[0642] Step 4:

[0643] The server analyzes the converted text data using a natural language processing engine to identify user needs. It grasps specific requests such as, "I want to check the schedule and progress of project meetings."

[0644] Step 5:

[0645] Based on the analysis results, the server retrieves relevant meeting information from calendar data in the cloud and sets reminders as needed.

[0646] Step 6:

[0647] The terminal notifies the user of meeting information retrieved from the server. A clean and intuitive interface displays information such as the date, time, location, and participants of the meeting.

[0648] Step 7:

[0649] Based on the emotion analysis results, the server generates suggestions to alleviate the user's stress. If significant anxiety is detected, it will suggest relaxing activities after the meeting.

[0650] Step 8:

[0651] The server encrypts all voice and emotional data and manages it to ensure privacy is respected. Emotional data will not be used for any other purpose without the user's consent.

[0652] These steps allow users to receive comprehensive business support, including sentiment analysis.

[0653] (Example 2)

[0654] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0655] In modern society, the presence of multiple electronic devices and vast amounts of data makes efficient information management and decision-making difficult. Furthermore, scheduling and information provision that disregards emotions can degrade the user experience. Therefore, there is a need for systems that understand emotions and provide optimal support to users.

[0656] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0657] In this invention, the server includes a device that reads acoustic signals and converts those signals into text information, a device that analyzes the text information and automatically records schedules and tasks, and a device that analyzes emotional information and dynamically adjusts the processing priorities and methods. This enables efficient information management and decision-making support that takes into account the user's emotions.

[0658] "Acoustic signals" refer to waveform data and information obtained from speech or sound.

[0659] "Textual information" refers to digital text data obtained as a result of converting acoustic signals.

[0660] "Device" refers to a physical or virtual piece of equipment configured to perform a specific function or role.

[0661] "Analysis" refers to the process of processing specific data or information to understand its meaning and patterns.

[0662] "Schedule" refers to an activity or event that should be carried out at a specific time.

[0663] "Challenges" refer to problems that need to be solved or goals that need to be achieved.

[0664] "Electronic equipment" refers to devices that operate using electrical or electronic components.

[0665] "Time management information" refers to date and time information related to schedules and tasks.

[0666] "Notifications" refer to messages or alerts provided to inform users of important information or events.

[0667] "History" refers to a record of past actions or events.

[0668] "External knowledge" refers to general data and knowledge obtained from sources outside the system.

[0669] "Emotional information" refers to data related to the user's emotional state.

[0670] "Priority" refers to the order of importance or urgency when it comes to processing or dealing with tasks.

[0671] "Method" refers to the means or techniques adopted to achieve a specific goal.

[0672] "User interface" refers to the screens and methods of operation that allow a user to interact with a system.

[0673] This invention is an AI assistant system with an integrated emotion engine, which operates by combining speech recognition, natural language processing, sentiment analysis, data synchronization, and decision support. Specific embodiments are described below.

[0674] First, the user provides voice input through the microphone. The terminal captures this voice input and sends it to the server. The server converts the voice data into text data using speech recognition software (e.g., a speech recognition service). Simultaneously, an emotion engine (e.g., emotion analysis software) analyzes the user's emotions from the voice data and identifies specific emotional states. For example, if the user says, "I'm dreading today's meeting," the emotion engine will detect emotions such as "anxiety" and "stress."

[0675] Next, the server uses a natural language processing engine (e.g., a natural language analysis model) to analyze the text data. This allows the server to understand the user's instructions and requests and determine the appropriate action. Simultaneously, it adjusts processing priorities based on emotional information. For example, for a user experiencing stress, it might first suggest relaxing activities.

[0676] Subsequently, the server manages the schedule based on the analyzed data. This includes using a cloud storage service to synchronize schedule information across multiple devices. This allows users to access the latest information from any device.

[0677] Finally, the server uses historical data and external information to support decision-making. The recommendation engine generates and provides optimal suggestions to the user. The terminal adjusts the interface according to the user's emotions to improve usability.

[0678] For example, if a user asks, "What should I do this weekend?", the generative AI model might suggest, "How about watching a movie or going hiking?" An example of a prompt would be, "Please suggest relaxing activities based on the user's current mood."

[0679] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0680] Step 1:

[0681] The user provides voice input via the microphone. This serves as the initial input and is captured by the device as an audio signal. For example, the user might say, "Tell me what's on the schedule for tomorrow."

[0682] Step 2:

[0683] The terminal sends an audio signal to the server. The server uses a speech recognition engine to convert the audio signal into text information. This converted text information becomes the next input. Specifically, the process involves generating the text "Tell me what my schedule is for tomorrow" from the audio data.

[0684] Step 3:

[0685] The server analyzes textual information using an emotion engine to identify emotional states. The input is the converted textual information, and the output is the perceived emotion data. In this step, the server determines what emotional meaning the "question" has. For example, a specific action would be to perceive the emotion "interesting."

[0686] Step 4:

[0687] The server uses a natural language processing engine to analyze textual information and understand the user's intent. Input consists of text data and sentiment data, and output is an action based on the user's intent. Specifically, through intent analysis, it understands that the user wants to check their schedule for the next day.

[0688] Step 5:

[0689] The server retrieves information from the schedule database based on the analysis results. The input is filtering conditions based on the user's request, and the output is the relevant schedule information. Specifically, it retrieves appointments such as "tomorrow's meeting" from cloud services.

[0690] Step 6:

[0691] The server sends the acquired schedule information to the terminal, and the terminal synchronizes the information with multiple electronic devices. The input is the acquired schedule information, and the output is the synchronized data. Specifically, the schedule information is updated and displayed on smartphones and tablets.

[0692] Step 7:

[0693] The server uses an AI model to generate value-added recommendations based on sentiment data and relevant information. The input is existing data and sentiment analysis results, and the output is personalized suggestions. A specific example of its operation would be a suggestion like, "I recommend relaxing at a cafe after tomorrow's meeting."

[0694] Step 8:

[0695] The device adjusts the user interface and presents information in a design suited to the user. Input is data related to emotions and user requests, and output is a customized UI display. Specifically, the screen's color scheme and font size are adjusted according to the emotion.

[0696] (Application Example 2)

[0697] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0698] In today's information society, users own multiple devices, making schedule management and task prioritization across these devices increasingly complex. Furthermore, there is a growing demand for personalized support that takes into account the user's emotional state. However, conventional systems lack effective integration of sentiment analysis into decision support, failing to significantly improve the user experience.

[0699] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0700] In this invention, the server includes means for receiving instructions via speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for identifying the user's emotional state using sentiment analysis and dynamically adjusting task priorities according to that state. This enables not only schedule synchronization across multiple devices but also more personalized support based on the user's emotions.

[0701] "Speech recognition" is a technology that receives audio data and converts it into text data.

[0702] "Text data" refers to string information that is recognized and converted from audio data.

[0703] "Analysis" is the process of understanding the information contained in data and extracting specific meanings.

[0704] "Plans" refer to scheduled actions or events that are to be carried out in the future.

[0705] "Work" refers to an activity or process carried out with a specific purpose.

[0706] An "device" is a machine or instrument that has a specific function.

[0707] "Schedule synchronization" refers to maintaining information consistency by coordinating schedules across multiple devices.

[0708] "Notification" refers to informing the user of specified information via an alert.

[0709] "External information" refers to additional data obtained from outside the user's environment.

[0710] "Decision support" refers to the act of providing advice and recommendations to help make the best possible decisions.

[0711] "Behavioral history" refers to a record of the user's past actions.

[0712] "Operating environment" refers to the physical or virtual environment in which a user performs their work.

[0713] "Emotional analysis" is a technology that reads emotional nuances from voice and text to determine the user's emotional state.

[0714] A "household robot" is an autonomous or semi-autonomous machine designed to support individual daily life within the home.

[0715] In implementing this invention, the server utilizes speech recognition functionality to convert voice commands from the user into text data. General-purpose speech recognition software can be used for speech recognition. The converted text data is analyzed using natural language processing techniques to understand the user's intent. Based on the analyzed data, the user's schedule and tasks are automatically registered. This enables the user to manage their schedule efficiently.

[0716] Furthermore, the server utilizes an emotion analysis engine to determine the user's emotions from their voice. Based on the results of the emotion analysis, it dynamically adjusts task priorities and provides support tailored to the user's emotional state. This can be achieved using emotion analysis software such as IBM Watson.

[0717] Furthermore, the home robot's terminal has interactive functions and communicates with the user through voice. The robot provides appropriate feedback and task management based on emotion analysis results, making the user's life more comfortable. Depending on the user's emotional state, the robot can perform intuitive interfaces and appropriate actions.

[0718] For example, if a user tells the robot, "I'm tired today," the server converts the audio into text and uses an emotion analysis engine to detect fatigue. It then reviews the user's schedule and suggests relaxing activities as needed. In this way, flexible support is provided based on the user's emotional state.

[0719] An example of a prompt for a generative AI model would be, "The user says they are tired. How can we reduce their burden?" This allows the AI to devise specific support measures and suggest them to the user.

[0720] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0721] Step 1:

[0722] The server receives voice input from the user and sends it to speech recognition software. The speech recognition software converts the voice data into text data. This results in the voice being obtained as text information.

[0723] Step 2:

[0724] The converted text data is passed from the server to a natural language processing engine, where the user's intent is analyzed. This analysis examines the relationships between verbs and nouns, identifying the actions and tasks the user desires. As a result, specific tasks to be performed are output.

[0725] Step 3:

[0726] The server provides voice data to an emotion analysis engine, which then analyzes the user's emotional state. The emotion analysis determines the emotion based on the tone of voice and word choice. For example, the emotional state might be expressed as "fatigue" or "stress."

[0727] Step 4:

[0728] The server associates the analyzed sentiment information with the task content and adjusts the priority. Based on this priority adjustment, it is determined whether the task should be executed immediately or postponed. A list of top-priority items is output.

[0729] Step 5:

[0730] The device provides clear feedback to the user based on priority. The home robot suggests recommended tasks and actions for relaxation to the user. Specific actions include voice suggestions and visual notifications via the display.

[0731] Step 6:

[0732] The robot receives additional instructions or new voice input from the user, and the server generates a new prompt accordingly, adding support measures as needed. This process continuously complements user support. This final interaction is then integrated as the final result of the process.

[0733] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0734] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0735] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0736] [Fourth Embodiment]

[0737] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0738] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0739] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0740] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0741] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0742] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0743] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0744] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0745] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0746] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0747] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0748] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0749] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0750] In implementing the present invention, a system is provided in which the central component, an AI agent, functions as a voice recognition, data processing, and a user interface. Specific embodiments of this system are described below.

[0751] Speech recognition and instruction analysis

[0752] Users can input voice instructions into the system in a conversational format. The server uses speech recognition technology to convert the input voice into text, and a natural language processing engine analyzes the user's instructions and intentions. This analysis automatically extracts the tasks and schedules the user requests.

[0753] Schedule management and synchronization

[0754] Based on the analysis results, the server registers the schedule in the cloud and sets reminders as needed. Devices synchronize this data via the cloud, allowing users to view their latest schedules on various devices such as desktops, tablets, and smartphones. Users can enjoy the convenience of easily managing their own schedules.

[0755] Support for decision-making and suggestion of priorities

[0756] The server utilizes big data analytics technology to process users' past behavioral patterns and external information. Based on the insights gained, it presents users with objective information and suggestions to support their future decision-making. For example, it makes suggestions to make future choices easier based on the options the user frequently selects.

[0757] Providing an individualized work environment

[0758] This system learns the user's behavioral history and preferences to propose an optimized workflow. This maximizes user productivity and creates a personalized work environment. The server, in turn, can continuously provide a more accurate service.

[0759] Security and multilingual support

[0760] In terms of security, the server encrypts all data and has features to manage access rights, thus protecting user data. Furthermore, this system supports multiple languages, enabling smooth operation even in international business environments.

[0761] As a concrete example, when a user plans a business trip, they can give a voice command such as "Prepare my business trip schedule for next week." The server then references past business trip data to create a schedule and automatically generates necessary documents and packing lists. This is automatically synchronized across multiple devices, allowing the user to efficiently complete their preparations before departure.

[0762] Thus, by utilizing the present invention, users can automate complex tasks and streamline their entire operations.

[0763] The following describes the processing flow.

[0764] Step 1:

[0765] The user gives instructions to the voice input device. They communicate their schedule by voice, such as "Schedule a project meeting for next Monday."

[0766] Step 2:

[0767] The server sends the received audio to the speech recognition engine. The engine converts the audio data into text data.

[0768] Step 3:

[0769] The server uses natural language processing to analyze text data to determine the date, time, and event details. For example, it extracts the date information "next Monday."

[0770] Step 4:

[0771] Based on the analyzed information, the server registers the event in the cloud-based calendar system.

[0772] Step 5:

[0773] The server sets up a reminder and prepares to notify you 30 minutes before the meeting.

[0774] Step 6:

[0775] The terminal synchronizes the latest schedule data from the cloud system, ensuring that information is displayed consistently across multiple devices.

[0776] Step 7:

[0777] The server sends a notification to the user's device at the scheduled time according to the set reminder. By checking this notification, the user can ensure they don't forget to participate in the scheduled event.

[0778] Step 8:

[0779] When a user requests a suggestion or decision, the server performs big data analysis and provides the best possible suggestion based on past behavior and external information.

[0780] Step 9:

[0781] The server receives user feedback, updates its machine learning model, and makes future recommendations more accurate.

[0782] Through this series of processes, the system automates user schedule management and supports effective decision-making.

[0783] (Example 1)

[0784] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0785] In today's busy work environment, users demand systems that efficiently manage their schedules and automate tasks. However, conventional systems have suffered from low accuracy in recognizing voice commands and incomplete synchronization across multiple devices. Furthermore, they lacked features to support decision-making, making it difficult to provide a work environment tailored to individual user needs. In addition, insufficient multilingual support and security measures limited support for international business operations.

[0786] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0787] This invention includes a server that receives instructions via speech recognition technology and converts those instructions into text data, a means for analyzing the text data and automatically registering schedule management data and tasks, and a means for synchronizing schedule management across multiple information terminals and providing notification functions. As a result, users can streamline schedule management through highly accurate speech recognition and check the latest information across multiple devices. Furthermore, by utilizing past usage data, users can receive suggestions to support decision-making, thereby optimizing and streamlining operations. In addition, multilingual functionality and data encryption enable support for international business and high security.

[0788] "Voice recognition technology" is a technology that converts a user's voice into a digital signal and generates text data or instructions by analyzing its content.

[0789] "Character data" refers to digital data used for information processing and data communication, based on strings of characters obtained through speech recognition or manual input.

[0790] "Schedule management data" refers to data used to electronically record and manage users' schedules and task information.

[0791] An "information terminal" refers to a device used to manipulate digital data, such as a computer, smartphone, or tablet.

[0792] "Schedule management" refers to activities and methods for organizing and managing users' schedules.

[0793] A "notification function" is a feature that provides users with timely information or warnings that have been set in advance.

[0794] "Multilingual functionality" refers to the ability of a system or service to use and switch between multiple languages simultaneously.

[0795] "Data encryption" is the process of transforming data using cryptographic techniques to protect it from unauthorized access.

[0796] "Access rights management" refers to a management method that controls access rights to specific information or functions, ensuring that only authorized users can access them.

[0797] The present invention aims to provide a system that allows users to efficiently manage their schedules using voice commands. This system operates by integrating voice recognition technology, natural language processing technology, data synchronization technology, big data analysis technology, multilingual support, and security technology.

[0798] The server uses common speech recognition technology to convert the user's voice instructions into text data. Specifically, this can be done using a speech recognition API. The converted text data is then analyzed by a natural language processing engine (e.g., a generative AI model) to understand the user's intent and instructions. This analysis automatically generates schedule management data and tasks.

[0799] The analyzed data is registered on a cloud platform for schedule management. This registration can be done using a cloud service API. The device synchronizes this data via the cloud, allowing users to access it from multiple devices such as PCs, tablets, and smartphones. This makes it possible to check the latest schedule from any device.

[0800] Furthermore, the server can use big data analytics technology to process users' past behavioral data and external information. This processing makes it possible to provide suggestions to support decision-making. For example, based on past business trip history, it can automatically generate suggestions and packing lists for the next business trip.

[0801] Multilingual support enables smooth operation even in international environments. Data encryption technology and access control ensure the secure protection of user data.

[0802] For example, if a user voice-inputs "Prepare my business trip schedule for next week," the server uses speech recognition technology to convert the input into text, analyzes it using natural language processing, and then automatically creates a business trip schedule, generating a list of necessary documents and items to pack. This information is synchronized across multiple devices, allowing the user to prepare efficiently.

[0803] Example prompt: "The user has given the voice command 'Prepare my business trip schedule for next week.' Start the process of using past business trip data to create a schedule, generate a packing list, and sync it to the device."

[0804] In this way, this system can automate and streamline users' work while simultaneously providing a secure and multilingual system environment.

[0805] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0806] Step 1:

[0807] The user gives instructions to the system via voice input. Specifically, the user might say into the microphone, "Prepare my travel schedule for next week." This voice input becomes the input data for the system.

[0808] Step 2:

[0809] The server uses speech recognition technology to convert the input speech into text data. It analyzes the speech data and generates a string by mapping each phoneme. The output of this process is the user's instruction in text form. The resulting text data is "Prepare the schedule for next week's business trip."

[0810] Step 3:

[0811] The server passes the generated text data to a natural language processing engine to analyze the user's intent. A generative AI model is used to syntactically analyze the text and extract the user's instructions. The output of the analysis consists of specific action items necessary for schedule registration and task creation. Examples include "Create a business trip schedule," "Generate a document list," and "Create a packing checklist."

[0812] Step 4:

[0813] The server operates the schedule management application based on the analyzed data and registers the appointment on the cloud. Specifically, it uses an API to create a new date and registers related information. As an output, the new business trip appointment is added to the cloud calendar.

[0814] Step 5:

[0815] The device retrieves new schedule information registered in the cloud. By synchronizing data from the cloud, it becomes possible to display the same schedule on multiple devices such as smartphones and PCs. As output, the latest schedule is updated and displayed on the device.

[0816] Step 6:

[0817] The server uses big data analytics to investigate past travel data and related information, generating suggestions to support user decision-making. Prompt messages are used to gain insights from the generated AI model. The output includes an optimal packing list and important points for the user.

[0818] Step 7:

[0819] Users can review updated schedules and suggestions from the server, and make modifications as needed. By utilizing the suggestions provided via smart devices, more efficient schedule management is possible. The output is a finalized, optimized schedule.

[0820] (Application Example 1)

[0821] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0822] In modern life, individual users' schedules and tasks are frequently updated and demanding, making efficient management difficult. Furthermore, while there is a growing need for devices that support personal daily life to provide more advanced assistance, the means to achieve this are not yet sufficiently available. Therefore, a system is needed that enables more sophisticated schedule management and personalized decision-making support for users.

[0823] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0824] In this invention, the server includes means for receiving instructions using speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for efficiently managing tasks for devices that support daily life through voice input. This makes it easier for users to manage complex schedules and to receive advanced daily life support and decision-making support through individualized life support devices.

[0825] "Speech recognition" is a technology that analyzes speech and converts it into text data.

[0826] "Instructions" are requests or commands provided by the user, either verbally or in writing.

[0827] "Text data" refers to text information converted by speech recognition.

[0828] "Analysis" is the process of understanding textual data and extracting necessary information from it.

[0829] "Schedules" refer to scheduling information about future events or tasks.

[0830] A "task" is a task or activity set by the user.

[0831] "Synchronization" is the process of matching information across multiple devices.

[0832] A "notification" is a warning or message sent to inform a user about schedules and tasks.

[0833] "Information" refers to general data, including past user actions and external data.

[0834] "Advice" refers to suggestions or recommendations offered to support decision-making.

[0835] "Learning" is the process of analyzing trends based on the user's behavior history and deepening one's understanding of them.

[0836] A "suggestion" is the act of providing users with actions or options to facilitate optimization.

[0837] "Equipment" refers to hardware or devices in general, and is a device intended to support users.

[0838] "Daily life support" refers to functions and technologies designed to support users' daily activities.

[0839] The system implementing this invention is based on a program that integrates speech recognition and natural language processing. The server utilizes the Google Cloud Speech-to-Text API, a leading speech recognition engine, to convert user speech into text using speech recognition technology. Next, the text data is analyzed through the Google Cloud Natural Language API to understand the user's intent and determine the necessary actions.

[0840] The device uses cloud-based synchronization to update analyzed schedule information and tasks across various devices via the Google Calendar API. This allows users to view the latest appointments and reminders across all their devices.

[0841] Furthermore, the server leverages historical data and external information to generate recommendations that support decision-making. In this process, big data analytics techniques are used to analyze user behavior patterns and provide more personalized advice.

[0842] For example, if a user says, "Schedule a meeting for tomorrow afternoon," the server processes the voice and automatically registers the meeting in Google Calendar. In addition, it sends the user a suitable notification so they can prepare before the meeting. Furthermore, based on this, helpful suggestions are provided for future meetings, making the user's task management even more efficient.

[0843] An example of a prompt message to a generative AI model is, "Please create a forecast schedule for next week and provide an optimized suggestion." This instruction allows the system to automatically present the optimal schedule.

[0844] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0845] Step 1:

[0846] The server receives voice input from the user. This voice data is converted into text data using the Google Cloud Speech-to-Text API. The input is voice data, and the output is text data.

[0847] Step 2:

[0848] The server analyzes the generated text data using the Google Cloud Natural Language API. This analysis extracts the user's intent and recognizes specific plans and tasks. The input to this analysis is text data, and the output is the user's intent and planned information.

[0849] Step 3:

[0850] The server registers schedules and tasks with the Google Calendar API based on the analysis results. At this stage, cloud synchronization is used to allow users to check the latest schedule across various devices. The input is the analyzed schedule information, and the output is the schedule synchronized across devices.

[0851] Step 4:

[0852] The server analyzes historical user data and external information. Using big data analytics techniques, it detects user behavior patterns and generates recommendations to support future decision-making. The input is historical behavior and external data, and the output is personalized recommendations.

[0853] Step 5:

[0854] The server uses a generative AI model to prompt the user to generate an optimal schedule for the following weeks. Specifically, it uses the prompt "Draft a predicted schedule for next week and provide an optimized suggestion" to enhance the task. The input is the user's past data and the prompt, and the output is an optimized schedule.

[0855] Step 6:

[0856] The device receives notifications from the server and reports necessary information to the user in a timely manner. This includes setting reminders and notifying users of new recommendations. The input is schedule information provided by the server, and the output is notifications to the user.

[0857] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0858] This invention implements an AI secretary system that integrates an emotion engine to enrich user interaction and effectively support business operations. This system operates by combining speech recognition, natural language processing, emotion analysis, data synchronization, and decision support. Specific embodiments are described below.

[0859] Speech recognition and emotion analysis

[0860] The user gives instructions to the system via voice input. The server uses a speech recognition engine to convert the voice data into text data. At the same time, an emotion engine analyzes the voice data to identify the user's emotional state. For example, if the user speaks with a tired voice, the emotion engine will detect "fatigue."

[0861] Instruction analysis and emotion-based processing

[0862] The server analyzes the converted text data using a natural language processing engine to understand the user's intent. In addition, it dynamically adjusts task priorities based on emotional information identified by the emotion engine. For example, if the user is feeling stressed, it prioritizes suggesting relaxing leisure events.

[0863] Schedule management and multi-device synchronization

[0864] Using the analyzed data, the server automatically registers schedules. The data is then managed in the cloud and synchronized across multiple devices via the terminal. This allows users to instantly access information from any device.

[0865] Decision support and dynamic interfaces

[0866] The server analyzes historical data and external information, taking emotional information into consideration, to provide recommendations that support optimal decision-making. Furthermore, the device dynamically changes the interface design and feedback content according to the user's emotions, resulting in a more user-friendly experience.

[0867] Security and Privacy Management

[0868] Data obtained through sentiment analysis is encrypted and used only with the user's consent. This ensures strict protection of user privacy.

[0869] Specific example

[0870] For example, if a user says, "I'm dreading tomorrow's meeting," the server will add meeting preparation tasks to the schedule, while the emotion engine will identify the emotion of "dread." As a result, the server will offer suggestions to help the user relax (such as suggesting delegable tasks or setting a break time after the meeting) to alleviate the user's psychological burden.

[0871] By integrating an emotion engine in this way, this system can provide more personalized and effective business support.

[0872] The following describes the processing flow.

[0873] Step 1:

[0874] The user gives instructions to the system via a voice input device, using phrases such as, "Tell me about this week's project meeting."

[0875] Step 2:

[0876] The server activates the speech recognition engine and converts the user's voice data into text data. At this stage, the audio signal is analyzed.

[0877] Step 3:

[0878] The server simultaneously activates an emotion analysis engine to detect the user's emotions from the tone and tempo of their voice. For example, if it detects anxiety, it records that information.

[0879] Step 4:

[0880] The server analyzes the converted text data using a natural language processing engine to identify user needs. It grasps specific requests such as, "I want to check the schedule and progress of project meetings."

[0881] Step 5:

[0882] Based on the analysis results, the server retrieves relevant meeting information from calendar data in the cloud and sets reminders as needed.

[0883] Step 6:

[0884] The terminal notifies the user of meeting information retrieved from the server. A clean and intuitive interface displays information such as the date, time, location, and participants of the meeting.

[0885] Step 7:

[0886] Based on the emotion analysis results, the server generates suggestions to alleviate the user's stress. If significant anxiety is detected, it will suggest relaxing activities after the meeting.

[0887] Step 8:

[0888] The server encrypts all voice and emotional data and manages it to ensure privacy is respected. Emotional data will not be used for any other purpose without the user's consent.

[0889] These steps allow users to receive comprehensive business support, including sentiment analysis.

[0890] (Example 2)

[0891] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0892] In modern society, the presence of multiple electronic devices and vast amounts of data makes efficient information management and decision-making difficult. Furthermore, scheduling and information provision that disregards emotions can degrade the user experience. Therefore, there is a need for systems that understand emotions and provide optimal support to users.

[0893] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0894] In this invention, the server includes a device that reads acoustic signals and converts those signals into text information, a device that analyzes the text information and automatically records schedules and tasks, and a device that analyzes emotional information and dynamically adjusts the processing priorities and methods. This enables efficient information management and decision-making support that takes into account the user's emotions.

[0895] "Acoustic signals" refer to waveform data and information obtained from speech or sound.

[0896] "Textual information" refers to digital text data obtained as a result of converting acoustic signals.

[0897] "Device" refers to a physical or virtual piece of equipment configured to perform a specific function or role.

[0898] "Analysis" refers to the process of processing specific data or information to understand its meaning and patterns.

[0899] "Schedule" refers to an activity or event that should be carried out at a specific time.

[0900] "Challenges" refer to problems that need to be solved or goals that need to be achieved.

[0901] "Electronic equipment" refers to devices that operate using electrical or electronic components.

[0902] "Time management information" refers to date and time information related to schedules and tasks.

[0903] "Notifications" refer to messages or alerts provided to inform users of important information or events.

[0904] "History" refers to a record of past actions or events.

[0905] "External knowledge" refers to general data and knowledge obtained from sources outside the system.

[0906] "Emotional information" refers to data related to the user's emotional state.

[0907] "Priority" refers to the order of importance or urgency when it comes to processing or dealing with tasks.

[0908] "Method" refers to the means or techniques adopted to achieve a specific goal.

[0909] "User interface" refers to the screens and methods of operation that allow a user to interact with a system.

[0910] This invention is an AI assistant system with an integrated emotion engine, which operates by combining speech recognition, natural language processing, sentiment analysis, data synchronization, and decision support. Specific embodiments are described below.

[0911] First, the user provides voice input through the microphone. The terminal captures this voice input and sends it to the server. The server converts the voice data into text data using speech recognition software (e.g., a speech recognition service). Simultaneously, an emotion engine (e.g., emotion analysis software) analyzes the user's emotions from the voice data and identifies specific emotional states. For example, if the user says, "I'm dreading today's meeting," the emotion engine will detect emotions such as "anxiety" and "stress."

[0912] Next, the server uses a natural language processing engine (e.g., a natural language analysis model) to analyze the text data. This allows the server to understand the user's instructions and requests and determine the appropriate action. Simultaneously, it adjusts processing priorities based on emotional information. For example, for a user experiencing stress, it might first suggest relaxing activities.

[0913] Subsequently, the server manages the schedule based on the analyzed data. This includes using a cloud storage service to synchronize schedule information across multiple devices. This allows users to access the latest information from any device.

[0914] Finally, the server uses historical data and external information to support decision-making. The recommendation engine generates and provides optimal suggestions to the user. The terminal adjusts the interface according to the user's emotions to improve usability.

[0915] For example, if a user asks, "What should I do this weekend?", the generative AI model might suggest, "How about watching a movie or going hiking?" An example of a prompt would be, "Please suggest relaxing activities based on the user's current mood."

[0916] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0917] Step 1:

[0918] The user provides voice input via the microphone. This serves as the initial input and is captured by the device as an audio signal. For example, the user might say, "Tell me what's on the schedule for tomorrow."

[0919] Step 2:

[0920] The terminal sends an audio signal to the server. The server uses a speech recognition engine to convert the audio signal into text information. This converted text information becomes the next input. Specifically, the process involves generating the text "Tell me what my schedule is for tomorrow" from the audio data.

[0921] Step 3:

[0922] The server analyzes textual information using an emotion engine to identify emotional states. The input is the converted textual information, and the output is the perceived emotion data. In this step, the server determines what emotional meaning the "question" has. For example, a specific action would be to perceive the emotion "interesting."

[0923] Step 4:

[0924] The server uses a natural language processing engine to analyze textual information and understand the user's intent. Input consists of text data and sentiment data, and output is an action based on the user's intent. Specifically, through intent analysis, it understands that the user wants to check their schedule for the next day.

[0925] Step 5:

[0926] The server retrieves information from the schedule database based on the analysis results. The input is filtering conditions based on the user's request, and the output is the relevant schedule information. Specifically, it retrieves appointments such as "tomorrow's meeting" from cloud services.

[0927] Step 6:

[0928] The server sends the acquired schedule information to the terminal, and the terminal synchronizes the information with multiple electronic devices. The input is the acquired schedule information, and the output is the synchronized data. Specifically, the schedule information is updated and displayed on smartphones and tablets.

[0929] Step 7:

[0930] The server uses an AI model to generate value-added recommendations based on sentiment data and relevant information. The input is existing data and sentiment analysis results, and the output is personalized suggestions. A specific example of its operation would be a suggestion like, "I recommend relaxing at a cafe after tomorrow's meeting."

[0931] Step 8:

[0932] The device adjusts the user interface and presents information in a design suited to the user. Input is data related to emotions and user requests, and output is a customized UI display. Specifically, the screen's color scheme and font size are adjusted according to the emotion.

[0933] (Application Example 2)

[0934] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0935] In today's information society, users own multiple devices, making schedule management and task prioritization across these devices increasingly complex. Furthermore, there is a growing demand for personalized support that takes into account the user's emotional state. However, conventional systems lack effective integration of sentiment analysis into decision support, failing to significantly improve the user experience.

[0936] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0937] In this invention, the server includes means for receiving instructions via speech recognition and converting those instructions into text data, means for analyzing the text data and automatically registering schedules and tasks, and means for identifying the user's emotional state using sentiment analysis and dynamically adjusting task priorities according to that state. This enables not only schedule synchronization across multiple devices but also more personalized support based on the user's emotions.

[0938] "Speech recognition" is a technology that receives audio data and converts it into text data.

[0939] "Text data" refers to string information that is recognized and converted from audio data.

[0940] "Analysis" is the process of understanding the information contained in data and extracting specific meanings.

[0941] "Plans" refer to scheduled actions or events that are to be carried out in the future.

[0942] "Work" refers to an activity or process carried out with a specific purpose.

[0943] An "device" is a machine or instrument that has a specific function.

[0944] "Schedule synchronization" refers to maintaining information consistency by coordinating schedules across multiple devices.

[0945] "Notification" refers to informing the user of specified information via an alert.

[0946] "External information" refers to additional data obtained from outside the user's environment.

[0947] "Decision support" refers to the act of providing advice and recommendations to help make the best possible decisions.

[0948] "Behavioral history" refers to a record of the user's past actions.

[0949] "Operating environment" refers to the physical or virtual environment in which a user performs their work.

[0950] "Emotional analysis" is a technology that reads emotional nuances from voice and text to determine the user's emotional state.

[0951] A "household robot" is an autonomous or semi-autonomous machine designed to support individual daily life within the home.

[0952] In implementing this invention, the server utilizes speech recognition functionality to convert voice commands from the user into text data. General-purpose speech recognition software can be used for speech recognition. The converted text data is analyzed using natural language processing techniques to understand the user's intent. Based on the analyzed data, the user's schedule and tasks are automatically registered. This enables the user to manage their schedule efficiently.

[0953] Furthermore, the server utilizes an emotion analysis engine to determine the user's emotions from their voice. Based on the results of the emotion analysis, it dynamically adjusts task priorities and provides support tailored to the user's emotional state. This can be achieved using emotion analysis software such as IBM Watson.

[0954] Furthermore, the home robot's terminal has interactive functions and communicates with the user through voice. The robot provides appropriate feedback and task management based on emotion analysis results, making the user's life more comfortable. Depending on the user's emotional state, the robot can perform intuitive interfaces and appropriate actions.

[0955] For example, if a user tells the robot, "I'm tired today," the server converts the audio into text and uses an emotion analysis engine to detect fatigue. It then reviews the user's schedule and suggests relaxing activities as needed. In this way, flexible support is provided based on the user's emotional state.

[0956] An example of a prompt for a generative AI model would be, "The user says they are tired. How can we reduce their burden?" This allows the AI to devise specific support measures and suggest them to the user.

[0957] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0958] Step 1:

[0959] The server receives voice input from the user and sends it to speech recognition software. The speech recognition software converts the voice data into text data. This results in the voice being obtained as text information.

[0960] Step 2:

[0961] The converted text data is passed from the server to a natural language processing engine, where the user's intent is analyzed. This analysis examines the relationships between verbs and nouns, identifying the actions and tasks the user desires. As a result, specific tasks to be performed are output.

[0962] Step 3:

[0963] The server provides voice data to an emotion analysis engine, which then analyzes the user's emotional state. The emotion analysis determines the emotion based on the tone of voice and word choice. For example, the emotional state might be expressed as "fatigue" or "stress."

[0964] Step 4:

[0965] The server associates the analyzed sentiment information with the task content and adjusts the priority. Based on this priority adjustment, it is determined whether the task should be executed immediately or postponed. A list of top-priority items is output.

[0966] Step 5:

[0967] The device provides clear feedback to the user based on priority. The home robot suggests recommended tasks and actions for relaxation to the user. Specific actions include voice suggestions and visual notifications via the display.

[0968] Step 6:

[0969] The robot receives additional instructions or new voice input from the user, and the server generates a new prompt accordingly, adding support measures as needed. This process continuously complements user support. This final interaction is then integrated as the final result of the process.

[0970] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0971] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0972] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0973] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0974] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0975] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0976] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0977] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0978] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0979] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0980] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0981] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0982] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0983] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0984] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0985] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0986] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0987] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0988] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0989] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0990] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0991] The following is further disclosed regarding the embodiments described above.

[0992] (Claim 1)

[0993] A means of receiving instructions via speech recognition and converting those instructions into text data,

[0994] A method for analyzing text data and automatically registering schedules and tasks,

[0995] A means of syncing schedules and providing reminders across multiple devices,

[0996] A means of making recommendations to support decision-making by analyzing past data and external information,

[0997] A means of learning the user's behavioral history and proposing the optimal work environment,

[0998] A system that includes this.

[0999] (Claim 2)

[1000] The system according to claim 1, which supports international business by providing multilingual support.

[1001] (Claim 3)

[1002] The system according to claim 1, which enhances the security of user data through encryption and access rights management.

[1003] "Example 1"

[1004] (Claim 1)

[1005] A means of receiving instructions using speech recognition technology and converting those instructions into text data,

[1006] A method for analyzing text data and automatically registering schedule management data and tasks,

[1007] A means of synchronizing schedule management across multiple information terminals and providing notification functions,

[1008] A means of processing past usage data and external information to provide suggestions that support decision-making,

[1009] A means of learning the user's activity history and suggesting the optimal work environment,

[1010] A system that includes this.

[1011] (Claim 2)

[1012] The system according to claim 1, which adds multilingual functionality to assist in international business.

[1013] (Claim 3)

[1014] The system according to claim 1, which enhances the security of user information through data encryption and access rights management.

[1015] "Application Example 1"

[1016] (Claim 1)

[1017] A means of receiving instructions using speech recognition and converting those instructions into text data,

[1018] A method for analyzing text data and automatically registering schedules and tasks,

[1019] A means for synchronizing schedules and providing notifications across multiple devices,

[1020] A means of providing advice to support decision-making by analyzing past information and external information,

[1021] A means of learning the user's behavior history and proposing an optimized work environment,

[1022] A means of efficiently managing tasks using voice input for devices that support daily life,

[1023] A system that includes this.

[1024] (Claim 2)

[1025] The system according to claim 1, which supports international work by providing multilingual support.

[1026] (Claim 3)

[1027] The system according to claim 1, which enhances the security of user information through encryption and access control.

[1028] "Example 2 of combining an emotion engine"

[1029] (Claim 1)

[1030] A device that reads an acoustic signal and converts that signal into text information,

[1031] A device that analyzes textual information and automatically records schedules and tasks,

[1032] A device that shares time management information and provides notifications among multiple electronic devices,

[1033] A device that analyzes past history and external knowledge to provide suggestions that support decision-making,

[1034] A device that learns the user's behavioral history and proposes the optimal work environment,

[1035] A device that analyzes emotional information and dynamically adjusts the processing priorities and methods,

[1036] A device that changes the user interface of a device according to emotions,

[1037] A system that includes this.

[1038] (Claim 2)

[1039] The system according to claim 1, which performs multilingual processing and supports international work.

[1040] (Claim 3)

[1041] The system according to claim 1, which enhances the security of user information through the encryption of information and the management of access control.

[1042] "Application example 2 when combining with an emotional engine"

[1043] (Claim 1)

[1044] A means of receiving instructions via speech recognition and converting those instructions into text data,

[1045] A method for analyzing text data and automatically registering schedules and tasks,

[1046] A means for synchronizing schedules and providing notifications across multiple devices,

[1047] A means of providing recommendations to support decision-making by analyzing past and external information,

[1048] A means of learning the user's behavioral history and proposing the optimal operating environment,

[1049] A means for identifying the user's emotional state using emotion analysis and dynamically adjusting task priorities according to that state,

[1050] A means of using interactive home robots to provide daily support,

[1051] A system that includes this.

[1052] (Claim 2)

[1053] The system according to claim 1, which supports international business by providing multilingual support.

[1054] (Claim 3)

[1055] The system according to claim 1, which enhances the security of user information through encryption and access rights management. [Explanation of symbols]

[1056] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of receiving instructions via speech recognition and converting those instructions into text data, A method for analyzing text data and automatically registering schedules and tasks, A means of syncing schedules and providing reminders across multiple devices, A means of making recommendations to support decision-making by analyzing past data and external information, A means of learning the user's behavioral history and proposing the optimal work environment, A system that includes this.

2. The system according to claim 1, which supports international business by providing multilingual support.

3. The system according to claim 1, which enhances the security of user data through encryption and access rights management.