system
The system addresses the challenges of accessing past conversations, task reminders, and feedback utilization in AI systems by collecting and classifying data, setting reminders, and enhancing system functionality.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-24
Smart Images

Figure 2026103500000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In conventional interactive artificial intelligence systems, there are problems that it is difficult for users to easily search for past conversation histories and reconfirm specific information. Also, since there is no notification for time-limited tasks, there is a problem that users are likely to forget important tasks. Furthermore, the utilization of feedback to users is insufficient, and it is difficult to continuously improve the system.
Means for Solving the Problems
[0005] This invention provides a system that allows users to easily find past conversations by collecting communication data exchanged between users and conversational artificial intelligence and classifying the data into predefined categories using natural language processing technology. Furthermore, it helps users not forget important tasks by analyzing communication data, including deadlines, and setting reminders and notifications. In addition, it achieves continuous functional improvement by collecting user feedback and utilizing it to improve the system.
[0006] A "user" is an individual or legal entity that utilizes an interactive artificial intelligence system.
[0007] "Conversational artificial intelligence" refers to an artificial intelligence system that can communicate with users using natural language.
[0008] "Communication data" is a general term for the voice or text information exchanged between a user and conversational artificial intelligence.
[0009] "Natural language processing technology" refers to technologies that enable computers to understand natural language, and includes technologies such as semantic analysis of text and keyword extraction.
[0010] A "category" refers to a specific type or group established for classifying communication data.
[0011] A "user interface" is the display area on a screen or device that a user uses to operate a system.
[0012] A "reminder" is a message or signal used to notify a user of a specific task or event.
[0013] "Feedback" refers to opinions and requests that users provide based on their experience using the system. [Brief explanation of the drawing]
[0014] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Modes for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] This invention is a system for efficiently managing data from conversations between a user and an interactive artificial intelligence. This system mainly consists of a server, a terminal, and the user.
[0036] Start a conversation with the user.
[0037] First, the user initiates communication with the conversational artificial intelligence via their device. The conversation is generated as text or audio and recorded in real time by the device. This recorded data is sent to a server at regular intervals. This aggregates all the conversation data on the server, enabling processing in the next step.
[0038] Data analysis and classification
[0039] The server analyzes the received conversation data using natural language processing techniques. Specifically, it understands the context of the conversation and classifies it into predefined categories. Through this classification process, for example, if a user is asking a question about shopping, the conversation will be organized into the "Shopping" category.
[0040] User interface management
[0041] The analyzed and categorized data becomes accessible through the user interface. Users can view conversation history for each category, making it easy to find and utilize past communications. The dashboard is designed to easily find the corresponding history if users want to view details for a specific category.
[0042] Setting reminders and notifications
[0043] Furthermore, this system automatically recognizes conversations that include time-sensitive tasks, and the server generates reminders for the user. Based on this, as the deadline approaches, a notification is sent to the user via their device, preventing tasks from being forgotten. This reminder function improves user productivity and helps them manage important appointments in a timely manner.
[0044] System improvements
[0045] User feedback is collected on the server and used to improve natural language processing technology and the user interface. This feedback allows the system to continuously be optimized to meet user needs.
[0046] These elements are designed to allow users to effectively utilize the system and efficiently manage their daily tasks and search their communication history.
[0047] The following describes the processing flow.
[0048] Step 1:
[0049] The user initiates a conversation with conversational artificial intelligence using their device. The device then works to record and collect text or audio data generated during the conversation in real time.
[0050] Step 2:
[0051] The terminal sends the collected communication data to the server at regular intervals. This enables low-latency data synchronization, even if not real-time. The transmitted data is aggregated on the server and prepared for the next processing step.
[0052] Step 3:
[0053] The server inputs the received data into a natural language processing engine to analyze the content of the conversation. This analysis extracts context and keywords, and classifies the data into appropriate categories. For example, a conversation about shopping would be classified into the "shopping" category.
[0054] Step 4:
[0055] The analyzed data is stored in a database on the server and formatted in a user-accessible format. This is to prepare the data for easy viewing on the dashboard.
[0056] Step 5:
[0057] Users can access the dashboard through their device and view conversation history organized by category. The dashboard offers an intuitive interface that allows users to select categories of interest and view details.
[0058] Step 6:
[0059] The server detects time-sensitive tasks and events from the communication data. Based on this information, it generates reminders on the user interface and sends notifications to the user's terminal at the specified time.
[0060] Step 7:
[0061] Users can provide feedback to the server regarding the system's usability and the effectiveness of reminders. The server analyzes this feedback and uses it to improve natural language processing techniques and the interface. Through this cycle, the system continues to evolve to meet user needs.
[0062] (Example 1)
[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0064] In modern society, the amount of information users exchange with conversational artificial intelligence is increasing, making it difficult to efficiently manage this information and obtain necessary information in a timely manner. Furthermore, there is a need for appropriate reminder functions to prevent users from forgetting tasks, including those with time constraints, but existing systems lack sufficient accuracy. In addition, there is a lack of automated methods to effectively utilize user feedback and continuously improve the system.
[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0066] In this invention, the server includes means for recording information exchanged between the user and the conversational artificial intelligence, means for classifying the information into pre-set categories using a natural language processing system, and means for understanding the context of the information with high accuracy and improving classification accuracy using a generative AI model. As a result, the user can efficiently manage conversational data and easily obtain the necessary information. Furthermore, for tasks that include time specifications, reminders are set and notifications are sent in a timely manner, improving the accuracy and convenience of information management.
[0067] "User" refers to the entity that operates this system and shares information with the interactive artificial intelligence.
[0068] "Conversational artificial intelligence" refers to an intelligent system that exchanges information through communication with users and uses natural language processing technology to understand and respond to that information.
[0069] "Information" refers to language-based data exchanged between a user and an interactive artificial intelligence, or records generated from that data.
[0070] "Means of recording" refers to a function or process for saving information exchanged between conversational artificial intelligence and users as digital data.
[0071] A "natural language processing system" refers to the technologies and algorithms used to analyze input text or speech and understand its grammar, context, and meaning.
[0072] A "generative AI model" refers to an artificial intelligence model built on deep learning and machine learning techniques to analyze and generate information.
[0073] An "information display screen" refers to a display or interface used to visually present analyzed and classified information to the user.
[0074] A "reminder" refers to a mechanism or function that notifies users of the due date of a set task or event.
[0075] "Means of notification" refers to methods, protocols, or devices for informing users of information, reminders, or other important messages.
[0076] To implement this invention, a terminal is used for the user to exchange information with an interactive artificial intelligence. The terminal functions as an input device for recording the content of the conversation in text or voice format, and speech recognition software is used for converting voice data to text. Furthermore, it is recommended to use a secure communication protocol such as HTTPS when transmitting data acquired in real time to a server.
[0077] The server implements a natural language processing system to process received information, utilizing generative AI models to analyze the context of the information with high accuracy. This process involves tokenization, part-of-speech tagging, and sentence structure analysis to classify the information. The classified data is then delivered by the server to an information display screen accessible to the user. This allows the user to easily refer to and utilize their past conversation history.
[0078] Furthermore, the server automatically detects time-sensitive tasks based on user interactions, sets reminders, and sends notifications to the user via their device as the specified date and time approach. This notification function utilizes the device's built-in notification system. By leveraging this feature, users can manage important appointments without missing any.
[0079] For example, when a user says, "Please add a trip to the library next Tuesday," the system analyzes this statement and categorizes it under "Schedule." As the deadline approaches, a reminder notification is sent to the user. Similarly, by prompting the user with a statement like, "Show all of the user's past conversations related to 'meals'," the system can display conversation data belonging to the specified category on the information display screen. This invention allows users to efficiently manage their conversations and use them flexibly according to their needs.
[0080] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0081] Step 1:
[0082] The user initiates a conversation with the conversational artificial intelligence via the terminal. The user's statements are input to the terminal as either text or voice. The terminal's voice recognition software converts the voice to text and temporarily stores the acquired data in digital format. The input data is either text or voice, and the output is text data.
[0083] Step 2:
[0084] The device sends collected text data to the server at regular intervals. Secure protocols such as HTTPS are used to protect the data during transmission. The transmitted input data is stored on the server. The output is the text data stored on the server.
[0085] Step 3:
[0086] The server analyzes the received text data using a natural language processing system. Specifically, it performs data tokenization, part-of-speech tagging, and sentence structure analysis, and uses a generative AI model to understand the context of the information. The input data is text data stored on the server, and the output is the analyzed information.
[0087] Step 4:
[0088] The server classifies the text data into predetermined categories based on the analysis results. These categories might include, for example, "schedule" or "shopping." This organizes the information, making it easy to search later. The input data is the analyzed information, and the output is the classified information.
[0089] Step 5:
[0090] The server delivers the categorized information to an information display screen accessible to the user. The user can use the terminal interface to view this information by category and perform searches and filtering as needed. The input data is the categorized information, and the output is the displayed information.
[0091] Step 6:
[0092] The server detects time-sensitive tasks from text data and sets reminders. As the deadline approaches, it sends notifications to the user via their device. This allows users to manage important appointments without missing any. The input data is parsed task information, and the output is reminder notifications.
[0093] (Application Example 1)
[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0095] Current home automation devices require individual user management, making it difficult to frequently forget tasks or efficiently manage multiple pieces of information. Furthermore, traditional reminder systems are limited to displays and notification functions, lacking the flexibility to adapt to individual household conditions and daily life situations.
[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0097] In this invention, the server includes a device for collecting information exchanged between the user and an interactive artificial intelligence; a device for organizing the information into predefined classifications using natural language processing technology; a device for displaying the classified information on a display device so that the user can easily find past conversations; a device for analyzing information including deadlines, generating and sending notifications; and a device for supporting the user's lifestyle management in a home automation device. This enables task management and reminder notifications tailored to the user's lifestyle, thereby smoothly supporting the individual's life.
[0098] A "user" is an individual who provides information to an interactive artificial intelligence to manage and streamline their daily life.
[0099] "Conversational artificial intelligence" is a program that has the ability to receive information from users, understand it using natural language processing technology, and respond.
[0100] "Information" refers to all text and audio data exchanged between the user and the conversational artificial intelligence.
[0101] A "collection device" is a combination of hardware and software used to collect information arising from the interaction between a user and an interactive artificial intelligence.
[0102] "Natural language processing technology" is a technology that processes human language and converts its meaning into a form that computers can understand.
[0103] A "system for organizing information" is a device for classifying and managing collected information based on defined criteria.
[0104] A "display device" refers to a display or related equipment that allows users to visually confirm information and easily find their conversation history.
[0105] A "notification generation and transmission device" is a device that creates alerts to inform users of time-sensitive tasks or important information, and notifies them via display or sound.
[0106] A "home automation device" is a device that integrates and operates various home appliances and management systems based on the user's living situation.
[0107] One embodiment of this invention requires a terminal installed in the home. This terminal is equipped with a voice input function and collects information through interaction with the user. The information obtained is converted into text data using speech recognition technology. A Raspberry Pi can be used as the hardware for this purpose.
[0108] Text data acquired by the terminal is transmitted wirelessly to a server. This server incorporates natural language processing technology and uses Google Cloud's AI services to analyze and classify the information. The classified information is displayed on a home display device with a user interface. This interface is designed to allow users to easily search and refer to their conversation history. A web application using the Vue.js framework is applicable.
[0109] Furthermore, the server analyzes time-sensitive data, generates notifications, and sends them to the device. This allows home automation devices to provide users with timely reminders. For example, if a user instructs the device to "set a reminder to take my medicine at 3 p.m.," the device can send a notification at the specified time.
[0110] In addition, the use of generative AI models to create prompts is cited as an application example. An example of a prompt might be, "We're running low on consumables, please add them to the shopping list." This mechanism helps to reflect user instructions more intuitively and quickly through such prompts.
[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0112] Step 1:
[0113] The user gives instructions to the device using their voice.
[0114] An input system that receives user voice commands collects voice data. This voice data becomes input and is converted into text data using speech recognition technology. This conversion process makes it possible to process information from voice data.
[0115] Step 2:
[0116] The device converts the audio data into text format.
[0117] The device converts the collected audio data into text using a speech recognition library. This output text data is then ready to be sent to the server. For example, the instruction "Set a reminder to take medicine at 3 PM" would be output as text data at this stage.
[0118] Step 3:
[0119] The terminal sends text data to the server.
[0120] The information, converted to text format, is transmitted to the server via wireless communication. To ensure the reliability of the data transfer, the terminal uses an encryption protocol.
[0121] Step 4:
[0122] The server analyzes the text data it receives.
[0123] The server uses Google Cloud's natural language processing API to analyze and interpret text data. Based on the analysis, it performs data calculations to classify the intent conveyed by the text and organize it into specific categories. The output from this step completes the organization and preparation of the information for use.
[0124] Step 5:
[0125] The server sends the classified data to the user interface.
[0126] The analyzed information is sent to the user interface of a web application using Vue.js, based on the classification results. There, it is output as a visual representation that allows the user to review and explore their past conversation history.
[0127] Step 6:
[0128] The server analyzes time-sensitive data and generates notifications.
[0129] If a user's instructions include a deadline, the server extracts that deadline information and generates a notification. For example, "Take medicine at 3pm" is recognized as a task with a deadline, and a reminder is generated.
[0130] Step 7:
[0131] The device sends a notification to the user.
[0132] As the deadline approaches, the device notifies the user of the generated reminder. Notifications are delivered via voice or display, providing a system to support the user's life management.
[0133] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0134] This invention provides an advanced system for managing user communication using conversational artificial intelligence, and includes functions for collecting, classifying, and displaying communication data, as well as recognizing user emotions. This system operates based on interactions between a server, a terminal, and the user.
[0135] Basic system configuration
[0136] First, the user converses with the conversational artificial intelligence via their device. The communication data generated during this conversation is recorded by the device as text or audio data. The recorded data is sent to a server at regular intervals and stored in a database. The server analyzes this data using natural language processing technology and classifies the conversation content into appropriate categories. This classification information is displayed on the user interface, allowing the user to easily find past conversations.
[0137] Emotional Engine Processing
[0138] A key feature of this invention is that the server is equipped with an emotion engine that can recognize the user's emotions from communication data. The emotion engine goes beyond simple language analysis, extracting emotions from voice tone and text context to evaluate the user's current and past emotional state. This information is stored on the server and used for tracking emotional fluctuations.
[0139] Response adjustment and feedback
[0140] Recognized emotions are reflected in the conversational artificial intelligence's responses and suggestions. For example, if a user expresses dissatisfaction, the system is programmed to provide a more considerate response. Furthermore, users can provide feedback through the interface, and this data is collected on a server and used to improve the entire system.
[0141] Explanation of specific examples
[0142] If a user is feeling stressed, the system uses an emotion engine to detect this state. Based on the emotional analysis, the AI offers suggestions on how to relax. In this way, the system goes beyond simply providing information and can respond flexibly to the user's emotions.
[0143] This system is a step up from conventional conversational artificial intelligence, aiming for a deeper mutual understanding with the user. By taking user emotions into account during interaction, it can provide a personalized experience for each individual user.
[0144] The following describes the processing flow.
[0145] Step 1:
[0146] The user speaks to the conversational artificial intelligence through the device. The user's statements are recorded in real time by the device as audio or text data. This data is temporarily stored on the device.
[0147] Step 2:
[0148] The device sends collected communication data to the server at regular intervals. Through this transmission process, the server receives the latest user data.
[0149] Step 3:
[0150] The server analyzes the received data using a natural language processing engine. This analysis extracts the context and keywords of the conversation, and the data is classified into existing categories. For example, it might be classified into categories such as "shopping" or "task management."
[0151] Step 4:
[0152] The server then uses an emotion engine to analyze the user's emotions from the communication data. The user's emotional state is evaluated based on context, word choice, tone of voice, and other factors.
[0153] Step 5:
[0154] The server adjusts the conversational artificial intelligence's responses based on the analyzed emotional data. For example, if the user is feeling stressed, the AI will adjust its responses to offer suggestions to help them relax.
[0155] Step 6:
[0156] The categorized conversation history and sentiment data are displayed in the user interface, namely the dashboard. Users can browse the dashboard and examine history based on specific categories or sentiment states.
[0157] Step 7:
[0158] The server collects user feedback and uses it to improve sentiment analysis and natural language processing technologies. This feedback cycle allows the system to evolve and provide more appropriate interactions for users.
[0159] (Example 2)
[0160] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0161] In modern society, the amount of information data generated by interactive algorithms between users and computers is increasing, and it is necessary to manage this data efficiently and effectively. Furthermore, there is a demand for more natural and personalized interactions by providing responses that take user emotions into consideration. Conventional technologies have faced challenges in adjusting responses using emotional information, effectively categorizing information, and configuring notifications. This project aims to solve these problems.
[0162] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0163] In this invention, the server includes means for collecting information data exchanged between the user and a computer-based interactive algorithm, means for classifying the information data into predefined categories using language analysis technology, and means for determining emotions using an analysis device and adjusting the response based on this determination. This enables effective management of information data and flexible responses based on user emotions.
[0164] A "user" is the entity that utilizes the interactive algorithm and is the provider of information data.
[0165] A "computer" refers to any device that performs information processing, and is a device within a system that is responsible for collecting, analyzing, and displaying data.
[0166] "Information data" refers to all communication content exchanged between a user and a computer, including voice and text data.
[0167] An "interactive algorithm" refers to a series of computational processes that generate responses through interaction with the user, and it incorporates learning capabilities and sentiment analysis technology.
[0168] "Language analysis technology" refers to techniques for understanding and categorizing information data through natural language processing, and aims at structural analysis and semantic understanding of text.
[0169] An "analysis device" refers to a device or software that extracts specific information, such as emotions, from information data and provides analysis results.
[0170] "Means for determining and adjusting emotions" refers to technologies or processes for evaluating emotions contained within information data and changing responses based on the results.
[0171] This invention is designed to optimize the interaction between the user and the computer by utilizing information technology.
[0172] System Configuration
[0173] The user first initiates a conversation with an interactive algorithm using a terminal. The terminal is equipped with a microphone and keyboard, and records information data in voice or text format. The information data recorded on the terminal is then transmitted to a server via the internet.
[0174] Data analysis and classification
[0175] The server processes information data using programming language environments such as Python and Java (registered trademark). The server utilizes natural language processing libraries (e.g., NLTK and SpaCy) to analyze the syntax and meaning of the data. Based on the results of this analysis, the data is categorized into predefined categories.
[0176] Judging and adjusting responses to emotions
[0177] The server also includes modules for analyzing emotions. For example, computational algorithms (such as machine learning techniques) are implemented to extract the user's emotions from the tone of voice data and the context of text. This makes it possible to generate responses that correspond to the user's current emotional state.
[0178] Examples of specific cases and prompt statements
[0179] When a user speaks to the device saying, "I haven't been sleeping well lately," the system categorizes this information data into the "health" category and performs an emotional analysis on the server. If the system determines that the user is experiencing anxiety, the AI will suggest, "Shall we find some ways to relax?" For example, a possible prompt might be something like, "I'm very busy and stressed right now."
[0180] This system aims to provide personalized responses to users through advanced data analysis and emotional response.
[0181] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0182] Step 1:
[0183] The user speaks to an interactive algorithm via the terminal. The terminal recognizes and records this input as voice or text data. Specifically, the terminal does this using a microphone or text input device and temporarily stores the data. The recorded voice or text information is generated as output data.
[0184] Step 2:
[0185] The terminal transmits the collected information data to the server at regular intervals. The terminal uses a network protocol to transfer the data. The input data is communication information recorded by the terminal, while the output data is the raw information data received by the server.
[0186] Step 3:
[0187] The server stores the received information data in a database. This is done using a data management system (e.g., an SQL database). The input is raw information data transmitted over the network, and the output is data that has been organized and stored in the database. Following storage, the server prepares to perform structural analysis of the data.
[0188] Step 4:
[0189] The server analyzes the received data using a natural language processing library. It applies syntactic and semantic analysis to the data and classifies it into predefined categories. The input is stored data in a database, and the output is the classification results for each category. Specifically, it uses language analysis techniques to extract topics from the text.
[0190] Step 5:
[0191] The server uses an emotion analysis module to extract the user's emotions from the data. Voice tone and word choice are analyzed to identify the emotional state. The input is analyzed text data, and emotional information is generated as output. The analysis device uses machine learning algorithms to evaluate the emotions.
[0192] Step 6:
[0193] The server generates and adjusts the response of the interactive algorithm based on the acquired sentiment information. Specifically, it uses a response generation algorithm to create an appropriate response that aligns with the informational data. The input is sentiment information and the user's past history data, and the output is a personalized response to the user.
[0194] Step 7:
[0195] The server sends the response back to the terminal. The terminal presents this response to the user via screen or audio. The input is the response data sent from the server, and the output is the visual or audio response provided to the user. This allows the user to initiate the next interaction based on the presented information.
[0196] (Application Example 2)
[0197] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0198] In modern living environments, there is a growing demand for personalized experiences that respond to users' emotions. Especially for robots and conversational artificial intelligence used in the home, it is desirable not only to provide information but also to understand the user's emotions and use that understanding to control appliances and adjust environmental settings. However, conventional systems have struggled to accurately recognize user emotions and automatically provide appropriate responses. Solving these challenges is therefore crucial.
[0199] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0200] In this invention, the server includes means for collecting information exchanged between the user and conversational artificial intelligence, means for classifying the information into predefined categories using natural language processing technology, and means for recognizing the user's emotions from voice data and text data and controlling home appliances based on those emotions. This makes it possible to provide a personalized experience that corresponds to the user's emotional state.
[0201] A "user" is an individual who communicates with conversational artificial intelligence or an entity that utilizes it.
[0202] "Conversational artificial intelligence" is an artificial system that has the ability to provide information and carry out instructions through dialogue with the user.
[0203] "Information" refers to the collective text or audio data exchanged between the user and the conversational artificial intelligence.
[0204] "Natural language processing technology" is a technology that enables computers to understand and analyze natural human language.
[0205] A "category" refers to a predefined group or type used to classify information.
[0206] "Voice data" refers to digital data representing sound information generated by the user's voice.
[0207] "Text data" refers to digitized textual information obtained through user input or speech recognition.
[0208] "Emotion recognition" is a technology that analyzes and identifies emotions from a user's voice or text.
[0209] "Home appliance control" refers to a function that operates or sets electrical appliances according to user instructions or status.
[0210] A "personalized experience" refers to the provision of services that are customized according to the individual user's circumstances and preferences.
[0211] In the system implementing this invention, a server, a user terminal, and multiple devices within the home work together. The user first initiates a conversation with an interactive artificial intelligence via the terminal. The information from this conversation is recorded as voice or text data and sent to the server at regular intervals. The server converts the voice information into text using a speech recognition module utilizing TENSORFLOW®, and further analyzes the text data using SpaCy for natural language processing. In addition, OpenAI®'s emotion analysis API is used for emotion recognition, accurately analyzing the user's emotions from the voice and text data.
[0212] Based on the analysis results, the user's emotional state is determined. For example, if the user expresses fatigue, the server controls lighting and music devices via the smart home platform, providing settings that promote relaxation. This system uses the emotional data stored on the server to generate personalized feedback and displays it in the user interface.
[0213] For example, if a user says to the conversational AI, "I'm very busy and tired today," the server will immediately adjust the lighting to a warmer color and play calming music. This process allows the user to enjoy a more personalized and comfortable experience.
[0214] Example prompt: "When the user wants to relax, have the robot suggest the optimal lighting and music combination."
[0215] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0216] Step 1:
[0217] The user speaks to the conversational artificial intelligence using a device. The input is voice data, which is recorded on the user's device. The device converts the voice signal into a digital format and sends the information to the server.
[0218] Step 2:
[0219] The server uses TensorFlow to perform speech recognition on the received audio data and outputs it as text data. This text data forms the basis for natural language processing. The conversion from audio waveform to string is the data processing in this step.
[0220] Step 3:
[0221] The server processes the text data through SpaCy, a natural language processing tool, to analyze its grammar and structure. Based on the analysis, categories related to the context and subject are determined. This analysis process involves understanding the structure of the input text and assigning categories.
[0222] Step 4:
[0223] The server uses OpenAI's sentiment analysis API to extract the user's emotions from text data. The input is the text data obtained in step 2, and emotion information based on it is output. Different response plans are selected based on the type of emotion (e.g., joy, sadness, stress).
[0224] Step 5:
[0225] Based on the analysis of emotions, the server generates commands to control home appliances via the smart home platform. For example, if relaxation is deemed necessary, it will adjust the lighting and play music. The input is emotion data, and the output is device control commands.
[0226] Step 6:
[0227] The server generates feedback information for the user based on all the above processing and displays it in the user interface. This feedback reports to the user how the system responded and stores the information for future interactions. The input is the result of past processing, and the output is the feedback message.
[0228] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0229] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0230] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0231] [Second Embodiment]
[0232] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0233] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0234] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0235] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0236] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0237] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0238] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0239] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0240] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0241] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0242] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0243] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0244] This invention is a system for efficiently managing data from conversations between a user and an interactive artificial intelligence. This system mainly consists of a server, a terminal, and the user.
[0245] Start a conversation with the user.
[0246] First, the user initiates communication with the conversational artificial intelligence via their device. The conversation is generated as text or audio and recorded in real time by the device. This recorded data is sent to a server at regular intervals. This aggregates all the conversation data on the server, enabling processing in the next step.
[0247] Data analysis and classification
[0248] The server analyzes the received conversation data using natural language processing techniques. Specifically, it understands the context of the conversation and classifies it into predefined categories. Through this classification process, for example, if a user is asking a question about shopping, the conversation will be organized into the "Shopping" category.
[0249] User interface management
[0250] The analyzed and categorized data becomes accessible through the user interface. Users can view conversation history for each category, making it easy to find and utilize past communications. The dashboard is designed to easily find the corresponding history if users want to view details for a specific category.
[0251] Setting reminders and notifications
[0252] Furthermore, this system automatically recognizes conversations that include time-sensitive tasks, and the server generates reminders for the user. Based on this, as the deadline approaches, a notification is sent to the user via their device, preventing tasks from being forgotten. This reminder function improves user productivity and helps them manage important appointments in a timely manner.
[0253] System improvements
[0254] User feedback is collected on the server and used to improve natural language processing technology and the user interface. This feedback allows the system to continuously be optimized to meet user needs.
[0255] These elements are designed to allow users to effectively utilize the system and efficiently manage their daily tasks and search their communication history.
[0256] The following describes the processing flow.
[0257] Step 1:
[0258] The user initiates a conversation with conversational artificial intelligence using their device. The device then works to record and collect text or audio data generated during the conversation in real time.
[0259] Step 2:
[0260] The terminal sends the collected communication data to the server at regular intervals. This enables low-latency data synchronization, even if not real-time. The transmitted data is aggregated on the server and prepared for the next processing step.
[0261] Step 3:
[0262] The server inputs the received data into a natural language processing engine to analyze the content of the conversation. This analysis extracts context and keywords, and classifies the data into appropriate categories. For example, a conversation about shopping would be classified into the "shopping" category.
[0263] Step 4:
[0264] The analyzed data is stored in a database on the server and formatted in a user-accessible format. This is to prepare the data for easy viewing on the dashboard.
[0265] Step 5:
[0266] Users can access the dashboard through their device and view conversation history organized by category. The dashboard offers an intuitive interface that allows users to select categories of interest and view details.
[0267] Step 6:
[0268] The server detects time-sensitive tasks and events from the communication data. Based on this information, it generates reminders on the user interface and sends notifications to the user's terminal at the specified time.
[0269] Step 7:
[0270] Users can provide feedback to the server regarding the system's usability and the effectiveness of reminders. The server analyzes this feedback and uses it to improve natural language processing techniques and the interface. Through this cycle, the system continues to evolve to meet user needs.
[0271] (Example 1)
[0272] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0273] In modern society, the amount of information users exchange with conversational artificial intelligence is increasing, making it difficult to efficiently manage this information and obtain necessary information in a timely manner. Furthermore, there is a need for appropriate reminder functions to prevent users from forgetting tasks, including those with time constraints, but existing systems lack sufficient accuracy. In addition, there is a lack of automated methods to effectively utilize user feedback and continuously improve the system.
[0274] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0275] In this invention, the server includes means for recording information exchanged between the user and the conversational artificial intelligence, means for classifying the information into pre-set categories using a natural language processing system, and means for understanding the context of the information with high accuracy and improving classification accuracy using a generative AI model. As a result, the user can efficiently manage conversational data and easily obtain the necessary information. Furthermore, for tasks that include time specifications, reminders are set and notifications are sent in a timely manner, improving the accuracy and convenience of information management.
[0276] "User" refers to the entity that operates this system and shares information with the interactive artificial intelligence.
[0277] "Conversational artificial intelligence" refers to an intelligent system that exchanges information through communication with users and uses natural language processing technology to understand and respond to that information.
[0278] "Information" refers to language-based data exchanged between a user and an interactive artificial intelligence, or records generated from that data.
[0279] "Means of recording" refers to a function or process for saving information exchanged between conversational artificial intelligence and users as digital data.
[0280] A "natural language processing system" refers to the technologies and algorithms used to analyze input text or speech and understand its grammar, context, and meaning.
[0281] A "generative AI model" refers to an artificial intelligence model built on deep learning and machine learning techniques to analyze and generate information.
[0282] The "information display screen" refers to a display or interface for visually presenting analyzed and classified information to the user.
[0283] The "reminder" refers to a mechanism or function for notifying the user of the due dates of set tasks or events.
[0284] The "means of notification" refers to a method, protocol, or device for notifying the user of information, reminders, or other important messages.
[0285] To implement this invention, a terminal for the user to exchange information with the interactive artificial intelligence is used. The terminal functions as an input device for recording the content of the dialogue in text or voice format, and voice recognition software is used for text conversion of voice data. Also, when transmitting the data acquired in real time to the server, it is recommended to use a secure communication protocol such as HTTPS.
[0286] The server implements a natural language processing system to process the received information, and utilizes a generative AI model to accurately analyze the context of the information. In this process, tokenization, part-of-speech tagging, and syntactic analysis are performed to classify the information. The classified data is distributed by the server to an information display screen accessible to the user. As a result, the user can easily refer to and utilize the past dialogue history.
[0287] In addition, the server has a function of automatically detecting due tasks from the dialogue with the user, setting reminders, and sending notifications to the user through the terminal when the specified date and time approaches. This notification function utilizes the built-in notification system of the terminal. By utilizing this function, the user can manage important schedules without missing them.
[0288] For example, when a user says, "Please add a trip to the library next Tuesday," the system analyzes this statement and categorizes it under "Schedule." As the deadline approaches, a reminder notification is sent to the user. Similarly, by prompting the user with a statement like, "Show all of the user's past conversations related to 'meals'," the system can display conversation data belonging to the specified category on the information display screen. This invention allows users to efficiently manage their conversations and use them flexibly according to their needs.
[0289] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0290] Step 1:
[0291] The user initiates a conversation with the conversational artificial intelligence via the terminal. The user's statements are input to the terminal as either text or voice. The terminal's voice recognition software converts the voice to text and temporarily stores the acquired data in digital format. The input data is either text or voice, and the output is text data.
[0292] Step 2:
[0293] The device sends collected text data to the server at regular intervals. Secure protocols such as HTTPS are used to protect the data during transmission. The transmitted input data is stored on the server. The output is the text data stored on the server.
[0294] Step 3:
[0295] The server analyzes the received text data using a natural language processing system. Specifically, it performs data tokenization, part-of-speech tagging, and sentence structure analysis, and uses a generative AI model to understand the context of the information. The input data is text data stored on the server, and the output is the analyzed information.
[0296] Step 4:
[0297] The server classifies the text data into predetermined categories based on the analysis results. These categories might include, for example, "schedule" or "shopping." This organizes the information, making it easy to search later. The input data is the analyzed information, and the output is the classified information.
[0298] Step 5:
[0299] The server delivers the categorized information to an information display screen accessible to the user. The user can use the terminal interface to view this information by category and perform searches and filtering as needed. The input data is the categorized information, and the output is the displayed information.
[0300] Step 6:
[0301] The server detects time-sensitive tasks from text data and sets reminders. As the deadline approaches, it sends notifications to the user via their device. This allows users to manage important appointments without missing any. The input data is parsed task information, and the output is reminder notifications.
[0302] (Application Example 1)
[0303] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0304] Current home automation devices require individual user management, making it difficult to frequently forget tasks or efficiently manage multiple pieces of information. Furthermore, traditional reminder systems are limited to displays and notification functions, lacking the flexibility to adapt to individual household conditions and daily life situations.
[0305] The specific processing by the specific processing unit 290 of the data processing apparatus 12 in Application Example 1 is realized by the following means.
[0306] In this invention, the server includes: a device that collects information exchanged between the user and the interactive artificial intelligence; a device that uses natural language processing technology to organize the information into predefined classifications; a device that enables the user to easily search for past conversations by displaying the classified information on a display device; a device that analyzes information including deadlines, generates notifications, and transmits them; and a device that supports the user's life management in a home automation device. Thereby, task management and reminder notifications according to the user's living habits become possible, and it becomes possible to smoothly support an individual's life.
[0307] A "user" is an individual who provides information to an interactive artificial intelligence and aims to manage daily life and improve efficiency.
[0308] An "interactive artificial intelligence" is a program that has the function of receiving information from a user, understanding it using natural language processing technology, and being able to respond.
[0309] "Information" refers to all text and voice data exchanged between the user and the interactive artificial intelligence.
[0310] A "collecting device" is a combination of hardware and software for collecting information generated from the interaction between the user and the interactive artificial intelligence.
[0311] "Natural language processing technology" is a technology for processing human language and converting its meaning into a form that can be understood by a computer.
[0312] An "organizing device" is a device for classifying and managing the collected information based on predefined criteria.
[0313] A "display device" refers to a display and its related devices for the user to visually confirm information and easily search for the conversation history.
[0314] A "notification generation and transmission device" is a device that creates alerts to inform users of time-sensitive tasks or important information, and notifies them via display or sound.
[0315] A "home automation device" is a device that integrates and operates various home appliances and management systems based on the user's living situation.
[0316] One embodiment of this invention requires a terminal installed in the home. This terminal is equipped with a voice input function and collects information through interaction with the user. The information obtained is converted into text data using speech recognition technology. A Raspberry Pi can be used as the hardware for this purpose.
[0317] Text data acquired by the terminal is transmitted wirelessly to a server. This server incorporates natural language processing technology and uses Google Cloud's AI services to analyze and classify the information. The classified information is displayed on a home display device with a user interface. This interface is designed to allow users to easily search and refer to their conversation history. A web application using the Vue.js framework is applicable.
[0318] Furthermore, the server analyzes time-sensitive data, generates notifications, and sends them to the device. This allows home automation devices to provide users with timely reminders. For example, if a user instructs the device to "set a reminder to take my medicine at 3 p.m.," the device can send a notification at the specified time.
[0319] In addition, the use of generative AI models to create prompts is cited as an application example. An example of a prompt might be, "We're running low on consumables, please add them to the shopping list." This mechanism helps to reflect user instructions more intuitively and quickly through such prompts.
[0320] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0321] Step 1:
[0322] The user gives instructions to the device using their voice.
[0323] An input system that receives user voice commands collects voice data. This voice data becomes input and is converted into text data using speech recognition technology. This conversion process makes it possible to process information from voice data.
[0324] Step 2:
[0325] The device converts the audio data into text format.
[0326] The device converts the collected audio data into text using a speech recognition library. This output text data is then ready to be sent to the server. For example, the instruction "Set a reminder to take medicine at 3 PM" would be output as text data at this stage.
[0327] Step 3:
[0328] The terminal sends text data to the server.
[0329] The information, converted to text format, is transmitted to the server via wireless communication. To ensure the reliability of the data transfer, the terminal uses an encryption protocol.
[0330] Step 4:
[0331] The server analyzes the text data it receives.
[0332] The server uses Google Cloud's natural language processing API to analyze and interpret text data. Based on the analysis, it performs data calculations to classify the intent conveyed by the text and organize it into specific categories. The output from this step completes the organization and preparation of the information for use.
[0333] Step 5:
[0334] The server sends the classified data to the user interface.
[0335] The analyzed information is sent to the user interface of a web application using Vue.js, based on the classification results. There, it is output as a visual representation that allows the user to review and explore their past conversation history.
[0336] Step 6:
[0337] The server analyzes time-sensitive data and generates notifications.
[0338] If a user's instructions include a deadline, the server extracts that deadline information and generates a notification. For example, "Take medicine at 3pm" is recognized as a task with a deadline, and a reminder is generated.
[0339] Step 7:
[0340] The device sends a notification to the user.
[0341] As the deadline approaches, the device notifies the user of the generated reminder. Notifications are delivered via voice or display, providing a system to support the user's life management.
[0342] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0343] This invention provides an advanced system for managing user communication using conversational artificial intelligence, and includes functions for collecting, classifying, and displaying communication data, as well as recognizing user emotions. This system operates based on interactions between a server, a terminal, and the user.
[0344] Basic system configuration
[0345] First, the user converses with the conversational artificial intelligence via their device. The communication data generated during this conversation is recorded by the device as text or audio data. The recorded data is sent to a server at regular intervals and stored in a database. The server analyzes this data using natural language processing technology and classifies the conversation content into appropriate categories. This classification information is displayed on the user interface, allowing the user to easily find past conversations.
[0346] Emotional Engine Processing
[0347] A key feature of this invention is that the server is equipped with an emotion engine that can recognize the user's emotions from communication data. The emotion engine goes beyond simple language analysis, extracting emotions from voice tone and text context to evaluate the user's current and past emotional state. This information is stored on the server and used for tracking emotional fluctuations.
[0348] Response adjustment and feedback
[0349] Recognized emotions are reflected in the conversational artificial intelligence's responses and suggestions. For example, if a user expresses dissatisfaction, the system is programmed to provide a more considerate response. Furthermore, users can provide feedback through the interface, and this data is collected on a server and used to improve the entire system.
[0350] Explanation of specific examples
[0351] If a user is feeling stressed, the system uses an emotion engine to detect this state. Based on the emotional analysis, the AI offers suggestions on how to relax. In this way, the system goes beyond simply providing information and can respond flexibly to the user's emotions.
[0352] This system is a step up from conventional conversational artificial intelligence, aiming for a deeper mutual understanding with the user. By taking user emotions into account during interaction, it can provide a personalized experience for each individual user.
[0353] The following describes the processing flow.
[0354] Step 1:
[0355] The user speaks to the conversational artificial intelligence through the device. The user's statements are recorded in real time by the device as audio or text data. This data is temporarily stored on the device.
[0356] Step 2:
[0357] The device sends collected communication data to the server at regular intervals. Through this transmission process, the server receives the latest user data.
[0358] Step 3:
[0359] The server analyzes the received data using a natural language processing engine. This analysis extracts the context and keywords of the conversation, and the data is classified into existing categories. For example, it might be classified into categories such as "shopping" or "task management."
[0360] Step 4:
[0361] The server then uses an emotion engine to analyze the user's emotions from the communication data. The user's emotional state is evaluated based on context, word choice, tone of voice, and other factors.
[0362] Step 5:
[0363] The server adjusts the conversational artificial intelligence's responses based on the analyzed emotional data. For example, if the user is feeling stressed, the AI will adjust its responses to offer suggestions to help them relax.
[0364] Step 6:
[0365] The categorized conversation history and sentiment data are displayed in the user interface, namely the dashboard. Users can browse the dashboard and examine history based on specific categories or sentiment states.
[0366] Step 7:
[0367] The server collects user feedback and uses it to improve sentiment analysis and natural language processing technologies. This feedback cycle allows the system to evolve and provide more appropriate interactions for users.
[0368] (Example 2)
[0369] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0370] In modern society, the amount of information data generated by interactive algorithms between users and computers is increasing, and it is necessary to manage this data efficiently and effectively. Furthermore, there is a demand for more natural and personalized interactions by providing responses that take user emotions into consideration. Conventional technologies have faced challenges in adjusting responses using emotional information, effectively categorizing information, and configuring notifications. This project aims to solve these problems.
[0371] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0372] In this invention, the server includes means for collecting information data exchanged between the user and a computer-based interactive algorithm, means for classifying the information data into predefined categories using language analysis technology, and means for determining emotions using an analysis device and adjusting the response based on this determination. This enables effective management of information data and flexible responses based on user emotions.
[0373] A "user" is the entity that utilizes the interactive algorithm and is the provider of information data.
[0374] A "computer" refers to any device that performs information processing, and is a device within a system that is responsible for collecting, analyzing, and displaying data.
[0375] "Information data" refers to all communication content exchanged between a user and a computer, including voice and text data.
[0376] An "interactive algorithm" refers to a series of computational processes for generating responses through interaction with the user, and it incorporates learning capabilities and sentiment analysis technology.
[0377] "Language analysis technology" refers to techniques for understanding and categorizing information data through natural language processing, with the aim of analyzing the structure and understanding the meaning of text.
[0378] An "analysis device" refers to a device or software that extracts specific information, such as emotions, from information data and provides analysis results.
[0379] "Means for determining and adjusting emotions" refers to technologies or processes for evaluating emotions contained within information data and changing responses based on the results.
[0380] This invention is designed to optimize the interaction between the user and the computer by utilizing information technology.
[0381] System Configuration
[0382] The user first initiates a conversation with an interactive algorithm using a terminal. The terminal is equipped with a microphone and keyboard, and records information data in voice or text format. The information data recorded on the terminal is then transmitted to a server via the internet.
[0383] Data analysis and classification
[0384] The server processes information data using programming language environments such as Python and Java. The server utilizes natural language processing libraries (e.g., NLTK and SpaCy) to analyze the syntax and semantics of the data. Based on these analysis results, the data is categorized into predefined categories.
[0385] Judging and adjusting responses to emotions
[0386] The server also includes modules for analyzing emotions. For example, computational algorithms (such as machine learning techniques) are implemented to extract the user's emotions from the tone of voice data and the context of text. This makes it possible to generate responses that correspond to the user's current emotional state.
[0387] Examples of specific cases and prompt statements
[0388] When a user speaks to the device saying, "I haven't been sleeping well lately," the system categorizes this information data into the "health" category and performs an emotional analysis on the server. If the system determines that the user is experiencing anxiety, the AI will suggest, "Shall we find some ways to relax?" For example, a possible prompt might be something like, "I'm very busy and stressed right now."
[0389] This system aims to provide personalized responses to users through advanced data analysis and emotional response.
[0390] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0391] Step 1:
[0392] The user speaks to an interactive algorithm via the terminal. The terminal recognizes and records this input as voice or text data. Specifically, the terminal does this using a microphone or text input device and temporarily stores the data. The recorded voice or text information is generated as output data.
[0393] Step 2:
[0394] The terminal transmits the collected information data to the server at regular intervals. The terminal uses a network protocol to transfer the data. The input data is communication information recorded by the terminal, while the output data is the raw information data received by the server.
[0395] Step 3:
[0396] The server stores the received information data in a database. This is done using a data management system (e.g., an SQL database). The input is raw information data transmitted over the network, and the output is data that has been organized and stored in the database. Following storage, the server prepares to perform structural analysis of the data.
[0397] Step 4:
[0398] The server analyzes the received data using a natural language processing library. It applies syntactic and semantic analysis to the data and classifies it into predefined categories. The input is stored data in a database, and the output is the classification results for each category. Specifically, it uses language analysis techniques to extract topics from the text.
[0399] Step 5:
[0400] The server uses an emotion analysis module to extract the user's emotions from the data. Voice tone and word choice are analyzed to identify the emotional state. The input is analyzed text data, and emotional information is generated as output. The analysis device uses machine learning algorithms to evaluate the emotions.
[0401] Step 6:
[0402] The server generates and adjusts the response of the interactive algorithm based on the acquired sentiment information. Specifically, it uses a response generation algorithm to create an appropriate response that aligns with the informational data. The input is sentiment information and the user's past history data, and the output is a personalized response to the user.
[0403] Step 7:
[0404] The server sends the response back to the terminal. The terminal presents this response to the user via screen or audio. The input is the response data sent from the server, and the output is the visual or audio response provided to the user. This allows the user to initiate the next interaction based on the presented information.
[0405] (Application Example 2)
[0406] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0407] In modern living environments, there is a growing demand for personalized experiences that respond to users' emotions. Especially for robots and conversational artificial intelligence used in the home, it is desirable not only to provide information but also to understand the user's emotions and use that understanding to control appliances and adjust environmental settings. However, conventional systems have struggled to accurately recognize user emotions and automatically provide appropriate responses. Solving these challenges is therefore crucial.
[0408] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0409] In this invention, the server includes means for collecting information exchanged between the user and conversational artificial intelligence, means for classifying the information into predefined categories using natural language processing technology, and means for recognizing the user's emotions from voice data and text data and controlling home appliances based on those emotions. This makes it possible to provide a personalized experience that corresponds to the user's emotional state.
[0410] A "user" is an individual who communicates with conversational artificial intelligence or an entity that utilizes it.
[0411] "Conversational artificial intelligence" is an artificial system that has the ability to provide information and carry out instructions through dialogue with the user.
[0412] "Information" refers to the collective text or audio data exchanged between the user and the conversational artificial intelligence.
[0413] "Natural language processing technology" is a technology that enables computers to understand and analyze natural human language.
[0414] A "category" refers to a predefined group or type used to classify information.
[0415] "Voice data" refers to digital data representing sound information generated by the user's voice.
[0416] "Text data" refers to digitized textual information obtained through user input or speech recognition.
[0417] "Emotion recognition" is a technology that analyzes and identifies emotions from a user's voice or text.
[0418] "Home appliance control" refers to a function that operates or sets electrical appliances according to user instructions or status.
[0419] A "personalized experience" refers to the provision of services that are customized according to the individual user's circumstances and preferences.
[0420] In the system implementing this invention, a server, a user terminal, and multiple devices within the home work together. The user first initiates a conversation with an interactive artificial intelligence via the terminal. The information from this conversation is recorded as voice or text data and sent to the server at regular intervals. The server converts the voice information into text using a speech recognition module based on TensorFlow, and then uses SpaCy to analyze the text data for natural language processing. Furthermore, OpenAI's emotion analysis API is used for emotion recognition, accurately analyzing the user's emotions from the voice and text data.
[0421] Based on the analysis results, the user's emotional state is determined. For example, if the user expresses fatigue, the server controls lighting and music devices via the smart home platform, providing settings that promote relaxation. This system uses the emotional data stored on the server to generate personalized feedback and displays it in the user interface.
[0422] For example, if a user says to the conversational AI, "I'm very busy and tired today," the server will immediately adjust the lighting to a warmer color and play calming music. This process allows the user to enjoy a more personalized and comfortable experience.
[0423] Example prompt: "When the user wants to relax, have the robot suggest the optimal lighting and music combination."
[0424] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0425] Step 1:
[0426] The user speaks to the conversational artificial intelligence using a device. The input is voice data, which is recorded on the user's device. The device converts the voice signal into a digital format and sends the information to the server.
[0427] Step 2:
[0428] The server uses TensorFlow to perform speech recognition on the received audio data and outputs it as text data. This text data forms the basis for natural language processing. The conversion from audio waveform to string is the data processing in this step.
[0429] Step 3:
[0430] The server processes the text data through SpaCy, a natural language processing tool, to analyze its grammar and structure. Based on the analysis, categories related to the context and subject are determined. This analysis process involves understanding the structure of the input text and assigning categories.
[0431] Step 4:
[0432] The server uses OpenAI's sentiment analysis API to extract the user's emotions from text data. The input is the text data obtained in step 2, and emotion information based on it is output. Different response plans are selected based on the type of emotion (e.g., joy, sadness, stress).
[0433] Step 5:
[0434] Based on the analysis of emotions, the server generates commands to control home appliances via the smart home platform. For example, if relaxation is deemed necessary, it will adjust the lighting and instruct music playback. The input is emotion data, and the output is device control commands.
[0435] Step 6:
[0436] The server generates feedback information for the user based on all the above processing and displays it in the user interface. This feedback reports to the user how the system responded and stores the information for future interactions. The input is the result of past processing, and the output is the feedback message.
[0437] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0438] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0439] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0440] [Third Embodiment]
[0441] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0442] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0443] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0444] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0445] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0446] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0447] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0448] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0449] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0450] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0451] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0452] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0453] This invention is a system for efficiently managing data from conversations between a user and an interactive artificial intelligence. This system mainly consists of a server, a terminal, and the user.
[0454] Start a conversation with the user.
[0455] First, the user initiates communication with the conversational artificial intelligence via their device. The conversation is generated as text or audio and recorded in real time by the device. This recorded data is sent to a server at regular intervals. This aggregates all the conversation data on the server, enabling processing in the next step.
[0456] Data analysis and classification
[0457] The server analyzes the received conversation data using natural language processing techniques. Specifically, it understands the context of the conversation and classifies it into predefined categories. Through this classification process, for example, if a user is asking a question about shopping, the conversation will be organized into the "Shopping" category.
[0458] User interface management
[0459] The analyzed and categorized data becomes accessible through the user interface. Users can view conversation history for each category, making it easy to find and utilize past communications. The dashboard is designed to easily find the corresponding history if users want to view details for a specific category.
[0460] Setting reminders and notifications
[0461] Furthermore, this system automatically recognizes conversations that include time-sensitive tasks, and the server generates reminders for the user. Based on this, as the deadline approaches, a notification is sent to the user via their device, preventing tasks from being forgotten. This reminder function improves user productivity and helps them manage important appointments in a timely manner.
[0462] System improvements
[0463] User feedback is collected on the server and used to improve natural language processing technology and the user interface. This feedback allows the system to continuously be optimized to meet user needs.
[0464] These elements are designed to allow users to effectively utilize the system and efficiently manage their daily tasks and search their communication history.
[0465] The following describes the processing flow.
[0466] Step 1:
[0467] The user initiates a conversation with conversational artificial intelligence using their device. The device then works to record and collect text or audio data generated during the conversation in real time.
[0468] Step 2:
[0469] The terminal sends the collected communication data to the server at regular intervals. This enables low-latency data synchronization, even if not real-time. The transmitted data is aggregated on the server and prepared for the next processing step.
[0470] Step 3:
[0471] The server inputs the received data into a natural language processing engine to analyze the content of the conversation. This analysis extracts context and keywords, and classifies the data into appropriate categories. For example, a conversation about shopping would be classified into the "shopping" category.
[0472] Step 4:
[0473] The analyzed data is stored in a database on the server and formatted in a user-accessible format. This is to prepare the data for easy viewing on the dashboard.
[0474] Step 5:
[0475] Users can access the dashboard through their device and view conversation history organized by category. The dashboard offers an intuitive interface that allows users to select categories of interest and view details.
[0476] Step 6:
[0477] The server detects time-sensitive tasks and events from the communication data. Based on this information, it generates reminders on the user interface and sends notifications to the user's terminal at the specified time.
[0478] Step 7:
[0479] Users can provide feedback to the server regarding the system's usability and the effectiveness of reminders. The server analyzes this feedback and uses it to improve natural language processing techniques and the interface. Through this cycle, the system continues to evolve to meet user needs.
[0480] (Example 1)
[0481] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0482] In modern society, the amount of information users exchange with conversational artificial intelligence is increasing, making it difficult to efficiently manage this information and obtain necessary information in a timely manner. Furthermore, there is a need for appropriate reminder functions to prevent users from forgetting tasks, including those with time constraints, but existing systems lack sufficient accuracy. In addition, there is a lack of automated methods to effectively utilize user feedback and continuously improve the system.
[0483] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0484] In this invention, the server includes means for recording information exchanged between the user and the conversational artificial intelligence, means for classifying the information into pre-set categories using a natural language processing system, and means for understanding the context of the information with high accuracy and improving classification accuracy using a generative AI model. As a result, the user can efficiently manage conversational data and easily obtain the necessary information. Furthermore, for tasks that include time specifications, reminders are set and notifications are sent in a timely manner, improving the accuracy and convenience of information management.
[0485] "User" refers to the entity that operates this system and shares information with the interactive artificial intelligence.
[0486] "Conversational artificial intelligence" refers to an intelligent system that exchanges information through communication with users and uses natural language processing technology to understand and respond to that information.
[0487] "Information" refers to language-based data exchanged between a user and an interactive artificial intelligence, or records generated from that data.
[0488] "Means of recording" refers to a function or process for saving information exchanged between conversational artificial intelligence and users as digital data.
[0489] A "natural language processing system" refers to the technologies and algorithms used to analyze input text or speech and understand its grammar, context, and meaning.
[0490] A "generative AI model" refers to an artificial intelligence model built on deep learning and machine learning techniques to analyze and generate information.
[0491] An "information display screen" refers to a display or interface used to visually present analyzed and classified information to the user.
[0492] A "reminder" refers to a mechanism or function that notifies users of the due date of a set task or event.
[0493] "Means of notification" refers to methods, protocols, or devices for informing users of information, reminders, or other important messages.
[0494] To implement this invention, a terminal is used for the user to exchange information with an interactive artificial intelligence. The terminal functions as an input device for recording the content of the conversation in text or voice format, and speech recognition software is used for converting voice data to text. Furthermore, it is recommended to use a secure communication protocol such as HTTPS when transmitting data acquired in real time to a server.
[0495] The server implements a natural language processing system to process received information, utilizing generative AI models to analyze the context of the information with high accuracy. This process involves tokenization, part-of-speech tagging, and sentence structure analysis to classify the information. The classified data is then delivered by the server to an information display screen accessible to the user. This allows the user to easily refer to and utilize their past conversation history.
[0496] Furthermore, the server automatically detects time-sensitive tasks based on user interactions, sets reminders, and sends notifications to the user via their device as the specified date and time approach. This notification function utilizes the device's built-in notification system. By leveraging this feature, users can manage important appointments without missing any.
[0497] For example, when a user says, "Please add a trip to the library next Tuesday," the system analyzes this statement and categorizes it under "Schedule." As the deadline approaches, a reminder notification is sent to the user. Similarly, by prompting the user with a statement like, "Show all of the user's past conversations related to 'meals'," the system can display conversation data belonging to the specified category on the information display screen. This invention allows users to efficiently manage their conversations and use them flexibly according to their needs.
[0498] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0499] Step 1:
[0500] The user initiates a conversation with the conversational artificial intelligence via the terminal. The user's statements are input to the terminal as either text or voice. The terminal's voice recognition software converts the voice to text and temporarily stores the acquired data in digital format. The input data is either text or voice, and the output is text data.
[0501] Step 2:
[0502] The device sends collected text data to the server at regular intervals. Secure protocols such as HTTPS are used to protect the data during transmission. The transmitted input data is stored on the server. The output is the text data stored on the server.
[0503] Step 3:
[0504] The server analyzes the received text data using a natural language processing system. Specifically, it performs data tokenization, part-of-speech tagging, and sentence structure analysis, and uses a generative AI model to understand the context of the information. The input data is text data stored on the server, and the output is the analyzed information.
[0505] Step 4:
[0506] The server classifies the text data into predetermined categories based on the analysis results. These categories might include, for example, "schedule" or "shopping." This organizes the information, making it easy to search later. The input data is the analyzed information, and the output is the classified information.
[0507] Step 5:
[0508] The server delivers the categorized information to an information display screen accessible to the user. The user can use the terminal interface to view this information by category and perform searches and filtering as needed. The input data is the categorized information, and the output is the displayed information.
[0509] Step 6:
[0510] The server detects time-sensitive tasks from text data and sets reminders. As the deadline approaches, it sends notifications to the user via their device. This allows users to manage important appointments without missing any. The input data is parsed task information, and the output is reminder notifications.
[0511] (Application Example 1)
[0512] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0513] Current home automation devices require individual user management, making it difficult to frequently forget tasks or efficiently manage multiple pieces of information. Furthermore, traditional reminder systems are limited to displays and notification functions, lacking the flexibility to adapt to individual household conditions and daily life situations.
[0514] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0515] In this invention, the server includes a device for collecting information exchanged between the user and an interactive artificial intelligence; a device for organizing the information into predefined classifications using natural language processing technology; a device for displaying the classified information on a display device so that the user can easily find past conversations; a device for analyzing information including deadlines, generating and sending notifications; and a device for supporting the user's lifestyle management in a home automation device. This enables task management and reminder notifications tailored to the user's lifestyle, thereby smoothly supporting the individual's life.
[0516] A "user" is an individual who provides information to an interactive artificial intelligence to manage and streamline their daily life.
[0517] "Conversational artificial intelligence" is a program that has the ability to receive information from users, understand it using natural language processing technology, and respond.
[0518] "Information" refers to all text and audio data exchanged between the user and the conversational artificial intelligence.
[0519] A "collection device" is a combination of hardware and software used to collect information arising from the interaction between a user and an interactive artificial intelligence.
[0520] "Natural language processing technology" is a technology that processes human language and converts its meaning into a form that computers can understand.
[0521] A "system for organizing information" is a device for classifying and managing collected information based on defined criteria.
[0522] A "display device" refers to a display or related equipment that allows users to visually confirm information and easily find their conversation history.
[0523] A "notification generation and transmission device" is a device that creates alerts to inform users of time-sensitive tasks or important information, and notifies them via display or sound.
[0524] A "home automation device" is a device that integrates and operates various home appliances and management systems based on the user's living situation.
[0525] One embodiment of this invention requires a terminal installed in the home. This terminal is equipped with a voice input function and collects information through interaction with the user. The information obtained is converted into text data using speech recognition technology. A Raspberry Pi can be used as the hardware for this purpose.
[0526] Text data acquired by the terminal is transmitted wirelessly to a server. This server incorporates natural language processing technology and uses Google Cloud's AI services to analyze and classify the information. The classified information is displayed on a home display device with a user interface. This interface is designed to allow users to easily search and refer to their conversation history. A web application using the Vue.js framework is applicable.
[0527] Furthermore, the server analyzes time-sensitive data, generates notifications, and sends them to the device. This allows home automation devices to provide users with timely reminders. For example, if a user instructs the device to "set a reminder to take my medicine at 3 p.m.," the device can send a notification at the specified time.
[0528] In addition, the use of generative AI models to create prompts is cited as an application example. An example of a prompt might be, "We're running low on consumables, please add them to the shopping list." This mechanism helps to reflect user instructions more intuitively and quickly through such prompts.
[0529] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0530] Step 1:
[0531] The user gives instructions to the device using their voice.
[0532] An input system that receives user voice commands collects voice data. This voice data becomes input and is converted into text data using speech recognition technology. This conversion process makes it possible to process information from voice data.
[0533] Step 2:
[0534] The device converts the audio data into text format.
[0535] The device converts the collected audio data into text using a speech recognition library. This output text data is then ready to be sent to the server. For example, the instruction "Set a reminder to take medicine at 3 PM" would be output as text data at this stage.
[0536] Step 3:
[0537] The terminal sends text data to the server.
[0538] The information, converted to text format, is transmitted to the server via wireless communication. To ensure the reliability of the data transfer, the terminal uses an encryption protocol.
[0539] Step 4:
[0540] The server analyzes the text data it receives.
[0541] The server uses Google Cloud's natural language processing API to analyze and interpret text data. Based on the analysis, it performs data calculations to classify the intent conveyed by the text and organize it into specific categories. The output from this step completes the organization and preparation of the information for use.
[0542] Step 5:
[0543] The server sends the classified data to the user interface.
[0544] The analyzed information is sent to the user interface of a web application using Vue.js, based on the classification results. There, it is output as a visual representation that allows the user to review and explore their past conversation history.
[0545] Step 6:
[0546] The server analyzes time-sensitive data and generates notifications.
[0547] If a user's instructions include a deadline, the server extracts that deadline information and generates a notification. For example, "Take medicine at 3pm" is recognized as a task with a deadline, and a reminder is generated.
[0548] Step 7:
[0549] The device sends a notification to the user.
[0550] As the deadline approaches, the device notifies the user of the generated reminder. Notifications are delivered via voice or display, providing a system to support the user's life management.
[0551] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0552] This invention provides an advanced system for managing user communication using conversational artificial intelligence, and includes functions for collecting, classifying, and displaying communication data, as well as recognizing user emotions. This system operates based on interactions between a server, a terminal, and the user.
[0553] Basic system configuration
[0554] First, the user converses with the conversational artificial intelligence via their device. The communication data generated during this conversation is recorded by the device as text or audio data. The recorded data is sent to a server at regular intervals and stored in a database. The server analyzes this data using natural language processing technology and classifies the conversation content into appropriate categories. This classification information is displayed on the user interface, allowing the user to easily find past conversations.
[0555] Emotional Engine Processing
[0556] A key feature of this invention is that the server is equipped with an emotion engine that can recognize the user's emotions from communication data. The emotion engine goes beyond simple language analysis, extracting emotions from voice tone and text context to evaluate the user's current and past emotional state. This information is stored on the server and used for tracking emotional fluctuations.
[0557] Response adjustment and feedback
[0558] Recognized emotions are reflected in the conversational artificial intelligence's responses and suggestions. For example, if a user expresses dissatisfaction, the system is programmed to provide a more considerate response. Furthermore, users can provide feedback through the interface, and this data is collected on a server and used to improve the entire system.
[0559] Explanation of specific examples
[0560] If a user is feeling stressed, the system uses an emotion engine to detect this state. Based on the emotional analysis, the AI offers suggestions on how to relax. In this way, the system goes beyond simply providing information and can respond flexibly to the user's emotions.
[0561] This system is a step up from conventional conversational artificial intelligence, aiming for a deeper mutual understanding with the user. By taking user emotions into account during interaction, it can provide a personalized experience for each individual user.
[0562] The following describes the processing flow.
[0563] Step 1:
[0564] The user speaks to the conversational artificial intelligence through the device. The user's statements are recorded in real time by the device as audio or text data. This data is temporarily stored on the device.
[0565] Step 2:
[0566] The device sends collected communication data to the server at regular intervals. Through this transmission process, the server receives the latest user data.
[0567] Step 3:
[0568] The server analyzes the received data using a natural language processing engine. This analysis extracts the context and keywords of the conversation, and the data is classified into existing categories. For example, it might be classified into categories such as "shopping" or "task management."
[0569] Step 4:
[0570] The server then uses an emotion engine to analyze the user's emotions from the communication data. The user's emotional state is evaluated based on context, word choice, tone of voice, and other factors.
[0571] Step 5:
[0572] The server adjusts the conversational artificial intelligence's responses based on the analyzed emotional data. For example, if the user is feeling stressed, the AI will adjust its responses to offer suggestions to help them relax.
[0573] Step 6:
[0574] The categorized conversation history and sentiment data are displayed in the user interface, namely the dashboard. Users can browse the dashboard and examine history based on specific categories or sentiment states.
[0575] Step 7:
[0576] The server collects user feedback and uses it to improve sentiment analysis and natural language processing technologies. This feedback cycle allows the system to evolve and provide more appropriate interactions for users.
[0577] (Example 2)
[0578] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0579] In modern society, the amount of information data generated by interactive algorithms between users and computers is increasing, and it is necessary to manage this data efficiently and effectively. Furthermore, there is a demand for more natural and personalized interactions by providing responses that take user emotions into consideration. Conventional technologies have faced challenges in adjusting responses using emotional information, effectively categorizing information, and configuring notifications. This project aims to solve these problems.
[0580] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0581] In this invention, the server includes means for collecting information data exchanged between the user and a computer-based interactive algorithm, means for classifying the information data into predefined categories using language analysis technology, and means for determining emotions using an analysis device and adjusting the response based on this determination. This enables effective management of information data and flexible responses based on user emotions.
[0582] A "user" is the entity that utilizes the interactive algorithm and is the provider of information data.
[0583] A "computer" refers to any device that performs information processing, and is a device within a system that is responsible for collecting, analyzing, and displaying data.
[0584] "Information data" refers to all communication content exchanged between a user and a computer, including voice and text data.
[0585] An "interactive algorithm" refers to a series of computational processes for generating responses through interaction with the user, and it incorporates learning capabilities and sentiment analysis technology.
[0586] "Language analysis technology" refers to techniques for understanding and categorizing information data through natural language processing, with the aim of analyzing the structure and understanding the meaning of text.
[0587] An "analysis device" refers to a device or software that extracts specific information, such as emotions, from information data and provides analysis results.
[0588] "Means for determining and adjusting emotions" refers to technologies or processes for evaluating emotions contained within information data and changing responses based on the results.
[0589] This invention is designed to optimize the interaction between the user and the computer by utilizing information technology.
[0590] System Configuration
[0591] The user first initiates a conversation with an interactive algorithm using a terminal. The terminal is equipped with a microphone and keyboard, and records information data in voice or text format. The information data recorded on the terminal is then transmitted to a server via the internet.
[0592] Data analysis and classification
[0593] The server processes information data using programming language environments such as Python and Java. The server utilizes natural language processing libraries (e.g., NLTK and SpaCy) to analyze the syntax and semantics of the data. Based on these analysis results, the data is categorized into predefined categories.
[0594] Judging and adjusting responses to emotions
[0595] The server also includes modules for analyzing emotions. For example, computational algorithms (such as machine learning techniques) are implemented to extract the user's emotions from the tone of voice data and the context of text. This makes it possible to generate responses that correspond to the user's current emotional state.
[0596] Examples of specific cases and prompt statements
[0597] When a user speaks to the device saying, "I haven't been sleeping well lately," the system categorizes this information data into the "health" category and performs an emotional analysis on the server. If the system determines that the user is experiencing anxiety, the AI will suggest, "Shall we find some ways to relax?" For example, a possible prompt might be something like, "I'm very busy and stressed right now."
[0598] This system aims to provide personalized responses to users through advanced data analysis and emotional response.
[0599] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0600] Step 1:
[0601] The user speaks to an interactive algorithm via the terminal. The terminal recognizes and records this input as voice or text data. Specifically, the terminal does this using a microphone or text input device and temporarily stores the data. The recorded voice or text information is generated as output data.
[0602] Step 2:
[0603] The terminal transmits the collected information data to the server at regular intervals. The terminal uses a network protocol to transfer the data. The input data is communication information recorded by the terminal, while the output data is the raw information data received by the server.
[0604] Step 3:
[0605] The server stores the received information data in a database. This is done using a data management system (e.g., an SQL database). The input is raw information data transmitted over the network, and the output is data that has been organized and stored in the database. Following storage, the server prepares to perform structural analysis of the data.
[0606] Step 4:
[0607] The server analyzes the received data using a natural language processing library. It applies syntactic and semantic analysis to the data and classifies it into predefined categories. The input is stored data in a database, and the output is the classification results for each category. Specifically, it uses language analysis techniques to extract topics from the text.
[0608] Step 5:
[0609] The server uses an emotion analysis module to extract the user's emotions from the data. Voice tone and word choice are analyzed to identify the emotional state. The input is analyzed text data, and emotional information is generated as output. The analysis device uses machine learning algorithms to evaluate the emotions.
[0610] Step 6:
[0611] The server generates and adjusts the response of the interactive algorithm based on the acquired sentiment information. Specifically, it uses a response generation algorithm to create an appropriate response that aligns with the informational data. The input is sentiment information and the user's past history data, and the output is a personalized response to the user.
[0612] Step 7:
[0613] The server sends the response back to the terminal. The terminal presents this response to the user via screen or audio. The input is the response data sent from the server, and the output is the visual or audio response provided to the user. This allows the user to initiate the next interaction based on the presented information.
[0614] (Application Example 2)
[0615] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0616] In modern living environments, there is a growing demand for personalized experiences that respond to users' emotions. Especially for robots and conversational artificial intelligence used in the home, it is desirable not only to provide information but also to understand the user's emotions and use that understanding to control appliances and adjust environmental settings. However, conventional systems have struggled to accurately recognize user emotions and automatically provide appropriate responses. Solving these challenges is therefore crucial.
[0617] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0618] In this invention, the server includes means for collecting information exchanged between the user and conversational artificial intelligence, means for classifying the information into predefined categories using natural language processing technology, and means for recognizing the user's emotions from voice data and text data and controlling home appliances based on those emotions. This makes it possible to provide a personalized experience that corresponds to the user's emotional state.
[0619] A "user" is an individual who communicates with conversational artificial intelligence or an entity that utilizes it.
[0620] "Conversational artificial intelligence" is an artificial system that has the ability to provide information and carry out instructions through dialogue with the user.
[0621] "Information" refers to the collective text or audio data exchanged between the user and the conversational artificial intelligence.
[0622] "Natural language processing technology" is a technology that enables computers to understand and analyze natural human language.
[0623] A "category" refers to a predefined group or type used to classify information.
[0624] "Voice data" refers to digital data representing sound information generated by the user's voice.
[0625] "Text data" refers to digitized textual information obtained through user input or speech recognition.
[0626] "Emotion recognition" is a technology that analyzes and identifies emotions from a user's voice or text.
[0627] "Home appliance control" refers to a function that operates or sets electrical appliances according to user instructions or status.
[0628] A "personalized experience" refers to the provision of services that are customized according to the individual user's circumstances and preferences.
[0629] In the system implementing this invention, a server, a user terminal, and multiple devices within the home work together. The user first initiates a conversation with an interactive artificial intelligence via the terminal. The information from this conversation is recorded as voice or text data and sent to the server at regular intervals. The server converts the voice information into text using a speech recognition module based on TensorFlow, and then uses SpaCy to analyze the text data for natural language processing. Furthermore, OpenAI's emotion analysis API is used for emotion recognition, accurately analyzing the user's emotions from the voice and text data.
[0630] Based on the analysis results, the user's emotional state is determined. For example, if the user expresses fatigue, the server controls lighting and music devices via the smart home platform, providing settings that promote relaxation. This system uses the emotional data stored on the server to generate personalized feedback and displays it in the user interface.
[0631] For example, if a user says to the conversational AI, "I'm very busy and tired today," the server will immediately adjust the lighting to a warmer color and play calming music. This process allows the user to enjoy a more personalized and comfortable experience.
[0632] Example prompt: "When the user wants to relax, have the robot suggest the optimal lighting and music combination."
[0633] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0634] Step 1:
[0635] The user speaks to the conversational artificial intelligence using a device. The input is voice data, which is recorded on the user's device. The device converts the voice signal into a digital format and sends the information to the server.
[0636] Step 2:
[0637] The server uses TensorFlow to perform speech recognition on the received audio data and outputs it as text data. This text data forms the basis for natural language processing. The conversion from audio waveform to string is the data processing in this step.
[0638] Step 3:
[0639] The server processes the text data through SpaCy, a natural language processing tool, to analyze its grammar and structure. Based on the analysis, categories related to the context and subject are determined. This analysis process involves understanding the structure of the input text and assigning categories.
[0640] Step 4:
[0641] The server uses OpenAI's sentiment analysis API to extract the user's emotions from text data. The input is the text data obtained in step 2, and emotion information based on it is output. Different response plans are selected based on the type of emotion (e.g., joy, sadness, stress).
[0642] Step 5:
[0643] Based on the analysis of emotions, the server generates commands to control home appliances via the smart home platform. For example, if relaxation is deemed necessary, it will adjust the lighting and instruct music playback. The input is emotion data, and the output is device control commands.
[0644] Step 6:
[0645] The server generates feedback information for the user based on all the above processing and displays it in the user interface. This feedback reports to the user how the system responded and stores the information for future interactions. The input is the result of past processing, and the output is the feedback message.
[0646] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0647] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0648] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0649] [Fourth Embodiment]
[0650] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0651] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0652] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0653] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0654] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0655] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0656] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0657] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0658] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0659] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0660] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0661] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0662] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0663] This invention is a system for efficiently managing data from conversations between a user and an interactive artificial intelligence. This system mainly consists of a server, a terminal, and the user.
[0664] Start a conversation with the user.
[0665] First, the user initiates communication with the conversational artificial intelligence via their device. The conversation is generated as text or audio and recorded in real time by the device. This recorded data is sent to a server at regular intervals. This aggregates all the conversation data on the server, enabling processing in the next step.
[0666] Data analysis and classification
[0667] The server analyzes the received conversation data using natural language processing techniques. Specifically, it understands the context of the conversation and classifies it into predefined categories. Through this classification process, for example, if a user is asking a question about shopping, the conversation will be organized into the "Shopping" category.
[0668] User interface management
[0669] The analyzed and categorized data becomes accessible through the user interface. Users can view conversation history for each category, making it easy to find and utilize past communications. The dashboard is designed to easily find the corresponding history if users want to view details for a specific category.
[0670] Setting reminders and notifications
[0671] Furthermore, this system automatically recognizes conversations that include time-sensitive tasks, and the server generates reminders for the user. Based on this, as the deadline approaches, a notification is sent to the user via their device, preventing tasks from being forgotten. This reminder function improves user productivity and helps them manage important appointments in a timely manner.
[0672] System improvements
[0673] User feedback is collected on the server and used to improve natural language processing technology and the user interface. This feedback allows the system to continuously be optimized to meet user needs.
[0674] These elements are designed to allow users to effectively utilize the system and efficiently manage their daily tasks and search their communication history.
[0675] The following describes the processing flow.
[0676] Step 1:
[0677] The user initiates a conversation with conversational artificial intelligence using their device. The device then works to record and collect text or audio data generated during the conversation in real time.
[0678] Step 2:
[0679] The terminal sends the collected communication data to the server at regular intervals. This enables low-latency data synchronization, even if not real-time. The transmitted data is aggregated on the server and prepared for the next processing step.
[0680] Step 3:
[0681] The server inputs the received data into a natural language processing engine to analyze the content of the conversation. This analysis extracts context and keywords, and classifies the data into appropriate categories. For example, a conversation about shopping would be classified into the "shopping" category.
[0682] Step 4:
[0683] The analyzed data is stored in a database on the server and formatted in a user-accessible format. This is to prepare the data for easy viewing on the dashboard.
[0684] Step 5:
[0685] Users can access the dashboard through their device and view conversation history organized by category. The dashboard offers an intuitive interface that allows users to select categories of interest and view details.
[0686] Step 6:
[0687] The server detects time-sensitive tasks and events from the communication data. Based on this information, it generates reminders on the user interface and sends notifications to the user's terminal at the specified time.
[0688] Step 7:
[0689] Users can provide feedback to the server regarding the system's usability and the effectiveness of reminders. The server analyzes this feedback and uses it to improve natural language processing techniques and the interface. Through this cycle, the system continues to evolve to meet user needs.
[0690] (Example 1)
[0691] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0692] In modern society, the amount of information users exchange with conversational artificial intelligence is increasing, making it difficult to efficiently manage this information and obtain necessary information in a timely manner. Furthermore, there is a need for appropriate reminder functions to prevent users from forgetting tasks, including those with time constraints, but existing systems lack sufficient accuracy. In addition, there is a lack of automated methods to effectively utilize user feedback and continuously improve the system.
[0693] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0694] In this invention, the server includes means for recording information exchanged between the user and the conversational artificial intelligence, means for classifying the information into pre-set categories using a natural language processing system, and means for understanding the context of the information with high accuracy and improving classification accuracy using a generative AI model. As a result, the user can efficiently manage conversational data and easily obtain the necessary information. Furthermore, for tasks that include time specifications, reminders are set and notifications are sent in a timely manner, improving the accuracy and convenience of information management.
[0695] "User" refers to the entity that operates this system and shares information with the interactive artificial intelligence.
[0696] "Conversational artificial intelligence" refers to an intelligent system that exchanges information through communication with users and uses natural language processing technology to understand and respond to that information.
[0697] "Information" refers to language-based data exchanged between a user and an interactive artificial intelligence, or records generated from that data.
[0698] "Means of recording" refers to a function or process for saving information exchanged between conversational artificial intelligence and users as digital data.
[0699] A "natural language processing system" refers to the technologies and algorithms used to analyze input text or speech and understand its grammar, context, and meaning.
[0700] A "generative AI model" refers to an artificial intelligence model built on deep learning and machine learning techniques to analyze and generate information.
[0701] An "information display screen" refers to a display or interface used to visually present analyzed and classified information to the user.
[0702] A "reminder" refers to a mechanism or function that notifies users of the due date of a set task or event.
[0703] "Means of notification" refers to methods, protocols, or devices for informing users of information, reminders, or other important messages.
[0704] To implement this invention, a terminal is used for the user to exchange information with an interactive artificial intelligence. The terminal functions as an input device for recording the content of the conversation in text or voice format, and speech recognition software is used for converting voice data to text. Furthermore, it is recommended to use a secure communication protocol such as HTTPS when transmitting data acquired in real time to a server.
[0705] The server implements a natural language processing system to process received information, utilizing generative AI models to analyze the context of the information with high accuracy. This process involves tokenization, part-of-speech tagging, and sentence structure analysis to classify the information. The classified data is then delivered by the server to an information display screen accessible to the user. This allows the user to easily refer to and utilize their past conversation history.
[0706] Furthermore, the server automatically detects time-sensitive tasks based on user interactions, sets reminders, and sends notifications to the user via their device as the specified date and time approach. This notification function utilizes the device's built-in notification system. By leveraging this feature, users can manage important appointments without missing any.
[0707] For example, when a user says, "Please add a trip to the library next Tuesday," the system analyzes this statement and categorizes it under "Schedule." As the deadline approaches, a reminder notification is sent to the user. Similarly, by prompting the user with a statement like, "Show all of the user's past conversations related to 'meals'," the system can display conversation data belonging to the specified category on the information display screen. This invention allows users to efficiently manage their conversations and use them flexibly according to their needs.
[0708] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0709] Step 1:
[0710] The user initiates a conversation with the conversational artificial intelligence via the terminal. The user's statements are input to the terminal as either text or voice. The terminal's voice recognition software converts the voice to text and temporarily stores the acquired data in digital format. The input data is either text or voice, and the output is text data.
[0711] Step 2:
[0712] The device sends collected text data to the server at regular intervals. Secure protocols such as HTTPS are used to protect the data during transmission. The transmitted input data is stored on the server. The output is the text data stored on the server.
[0713] Step 3:
[0714] The server analyzes the received text data using a natural language processing system. Specifically, it performs data tokenization, part-of-speech tagging, and sentence structure analysis, and uses a generative AI model to understand the context of the information. The input data is text data stored on the server, and the output is the analyzed information.
[0715] Step 4:
[0716] The server classifies the text data into predetermined categories based on the analysis results. These categories might include, for example, "schedule" or "shopping." This organizes the information, making it easy to search later. The input data is the analyzed information, and the output is the classified information.
[0717] Step 5:
[0718] The server delivers the categorized information to an information display screen accessible to the user. The user can use the terminal interface to view this information by category and perform searches and filtering as needed. The input data is the categorized information, and the output is the displayed information.
[0719] Step 6:
[0720] The server detects time-sensitive tasks from text data and sets reminders. As the deadline approaches, it sends notifications to the user via their device. This allows users to manage important appointments without missing any. The input data is parsed task information, and the output is reminder notifications.
[0721] (Application Example 1)
[0722] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0723] Current home automation devices require individual user management, making it difficult to frequently forget tasks or efficiently manage multiple pieces of information. Furthermore, traditional reminder systems are limited to displays and notification functions, lacking the flexibility to adapt to individual household conditions and daily life situations.
[0724] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0725] In this invention, the server includes a device for collecting information exchanged between the user and an interactive artificial intelligence; a device for organizing the information into predefined classifications using natural language processing technology; a device for displaying the classified information on a display device so that the user can easily find past conversations; a device for analyzing information including deadlines, generating and sending notifications; and a device for supporting the user's lifestyle management in a home automation device. This enables task management and reminder notifications tailored to the user's lifestyle, thereby smoothly supporting the individual's life.
[0726] A "user" is an individual who provides information to an interactive artificial intelligence to manage and streamline their daily life.
[0727] "Conversational artificial intelligence" is a program that has the ability to receive information from users, understand it using natural language processing technology, and respond.
[0728] "Information" refers to all text and audio data exchanged between the user and the conversational artificial intelligence.
[0729] A "collection device" is a combination of hardware and software used to collect information arising from the interaction between a user and an interactive artificial intelligence.
[0730] "Natural language processing technology" is a technology that processes human language and converts its meaning into a form that computers can understand.
[0731] A "system for organizing information" is a device for classifying and managing collected information based on defined criteria.
[0732] A "display device" refers to a display or related equipment that allows users to visually confirm information and easily find their conversation history.
[0733] A "notification generation and transmission device" is a device that creates alerts to inform users of time-sensitive tasks or important information, and notifies them via display or sound.
[0734] A "home automation device" is a device that integrates and operates various home appliances and management systems based on the user's living situation.
[0735] One embodiment of this invention requires a terminal installed in the home. This terminal is equipped with a voice input function and collects information through interaction with the user. The information obtained is converted into text data using speech recognition technology. A Raspberry Pi can be used as the hardware for this purpose.
[0736] Text data acquired by the terminal is transmitted wirelessly to a server. This server incorporates natural language processing technology and uses Google Cloud's AI services to analyze and classify the information. The classified information is displayed on a home display device with a user interface. This interface is designed to allow users to easily search and refer to their conversation history. A web application using the Vue.js framework is applicable.
[0737] Furthermore, the server analyzes time-sensitive data, generates notifications, and sends them to the device. This allows home automation devices to provide users with timely reminders. For example, if a user instructs the device to "set a reminder to take my medicine at 3 p.m.," the device can send a notification at the specified time.
[0738] In addition, the use of generative AI models to create prompts is cited as an application example. An example of a prompt might be, "We're running low on consumables, please add them to the shopping list." This mechanism helps to reflect user instructions more intuitively and quickly through such prompts.
[0739] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0740] Step 1:
[0741] The user gives instructions to the device using their voice.
[0742] An input system that receives user voice commands collects voice data. This voice data becomes input and is converted into text data using speech recognition technology. This conversion process makes it possible to process information from voice data.
[0743] Step 2:
[0744] The device converts the audio data into text format.
[0745] The device converts the collected audio data into text using a speech recognition library. This output text data is then ready to be sent to the server. For example, the instruction "Set a reminder to take medicine at 3 PM" would be output as text data at this stage.
[0746] Step 3:
[0747] The terminal sends text data to the server.
[0748] The information, converted to text format, is transmitted to the server via wireless communication. To ensure the reliability of the data transfer, the terminal uses an encryption protocol.
[0749] Step 4:
[0750] The server analyzes the text data it receives.
[0751] The server uses Google Cloud's natural language processing API to analyze and interpret text data. Based on the analysis, it performs data calculations to classify the intent conveyed by the text and organize it into specific categories. The output from this step completes the organization and preparation of the information for use.
[0752] Step 5:
[0753] The server sends the classified data to the user interface.
[0754] The analyzed information is sent to the user interface of a web application using Vue.js, based on the classification results. There, it is output as a visual representation that allows the user to review and explore their past conversation history.
[0755] Step 6:
[0756] The server analyzes time-sensitive data and generates notifications.
[0757] If a user's instructions include a deadline, the server extracts that deadline information and generates a notification. For example, "Take medicine at 3pm" is recognized as a task with a deadline, and a reminder is generated.
[0758] Step 7:
[0759] The device sends a notification to the user.
[0760] As the deadline approaches, the device notifies the user of the generated reminder. Notifications are delivered via voice or display, providing a system to support the user's life management.
[0761] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0762] This invention provides an advanced system for managing user communication using conversational artificial intelligence, and includes functions for collecting, classifying, and displaying communication data, as well as recognizing user emotions. This system operates based on interactions between a server, a terminal, and the user.
[0763] Basic system configuration
[0764] First, the user converses with the conversational artificial intelligence via their device. The communication data generated during this conversation is recorded by the device as text or audio data. The recorded data is sent to a server at regular intervals and stored in a database. The server analyzes this data using natural language processing technology and classifies the conversation content into appropriate categories. This classification information is displayed on the user interface, allowing the user to easily find past conversations.
[0765] Emotional Engine Processing
[0766] A key feature of this invention is that the server is equipped with an emotion engine that can recognize the user's emotions from communication data. The emotion engine goes beyond simple language analysis, extracting emotions from voice tone and text context to evaluate the user's current and past emotional state. This information is stored on the server and used for tracking emotional fluctuations.
[0767] Response adjustment and feedback
[0768] Recognized emotions are reflected in the conversational artificial intelligence's responses and suggestions. For example, if a user expresses dissatisfaction, the system is programmed to provide a more considerate response. Furthermore, users can provide feedback through the interface, and this data is collected on a server and used to improve the entire system.
[0769] Explanation of specific examples
[0770] If a user is feeling stressed, the system uses an emotion engine to detect this state. Based on the emotional analysis, the AI offers suggestions on how to relax. In this way, the system goes beyond simply providing information and can respond flexibly to the user's emotions.
[0771] This system is a step up from conventional conversational artificial intelligence, aiming for a deeper mutual understanding with the user. By taking user emotions into account during interaction, it can provide a personalized experience for each individual user.
[0772] The following describes the processing flow.
[0773] Step 1:
[0774] The user speaks to the conversational artificial intelligence through the device. The user's statements are recorded in real time by the device as audio or text data. This data is temporarily stored on the device.
[0775] Step 2:
[0776] The device sends collected communication data to the server at regular intervals. Through this transmission process, the server receives the latest user data.
[0777] Step 3:
[0778] The server analyzes the received data using a natural language processing engine. This analysis extracts the context and keywords of the conversation, and the data is classified into existing categories. For example, it might be classified into categories such as "shopping" or "task management."
[0779] Step 4:
[0780] The server then uses an emotion engine to analyze the user's emotions from the communication data. The user's emotional state is evaluated based on context, word choice, tone of voice, and other factors.
[0781] Step 5:
[0782] The server adjusts the conversational artificial intelligence's responses based on the analyzed emotional data. For example, if the user is feeling stressed, the AI will adjust its responses to offer suggestions to help them relax.
[0783] Step 6:
[0784] The categorized conversation history and sentiment data are displayed in the user interface, namely the dashboard. Users can browse the dashboard and examine history based on specific categories or sentiment states.
[0785] Step 7:
[0786] The server collects user feedback and uses it to improve sentiment analysis and natural language processing technologies. This feedback cycle allows the system to evolve and provide more appropriate interactions for users.
[0787] (Example 2)
[0788] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0789] In modern society, the amount of information data generated by interactive algorithms between users and computers is increasing, and it is necessary to manage this data efficiently and effectively. Furthermore, there is a demand for more natural and personalized interactions by providing responses that take user emotions into consideration. Conventional technologies have faced challenges in adjusting responses using emotional information, effectively categorizing information, and configuring notifications. This project aims to solve these problems.
[0790] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0791] In this invention, the server includes means for collecting information data exchanged between the user and a computer-based interactive algorithm, means for classifying the information data into predefined categories using language analysis technology, and means for determining emotions using an analysis device and adjusting the response based on this determination. This enables effective management of information data and flexible responses based on user emotions.
[0792] A "user" is the entity that utilizes the interactive algorithm and is the provider of information data.
[0793] A "computer" refers to any device that performs information processing, and is a device within a system that is responsible for collecting, analyzing, and displaying data.
[0794] "Information data" refers to all communication content exchanged between a user and a computer, including voice and text data.
[0795] An "interactive algorithm" refers to a series of computational processes for generating responses through interaction with the user, and it incorporates learning capabilities and sentiment analysis technology.
[0796] "Language analysis technology" refers to techniques for understanding and categorizing information data through natural language processing, with the aim of analyzing the structure and understanding the meaning of text.
[0797] An "analysis device" refers to a device or software that extracts specific information, such as emotions, from information data and provides analysis results.
[0798] "Means for determining and adjusting emotions" refers to technologies or processes for evaluating emotions contained within information data and changing responses based on the results.
[0799] This invention is designed to optimize the interaction between the user and the computer by utilizing information technology.
[0800] System Configuration
[0801] The user first initiates a conversation with an interactive algorithm using a terminal. The terminal is equipped with a microphone and keyboard, and records information data in voice or text format. The information data recorded on the terminal is then transmitted to a server via the internet.
[0802] Data analysis and classification
[0803] The server processes information data using programming language environments such as Python and Java. The server utilizes natural language processing libraries (e.g., NLTK and SpaCy) to analyze the syntax and semantics of the data. Based on these analysis results, the data is categorized into predefined categories.
[0804] Judging and adjusting responses to emotions
[0805] The server also includes modules for analyzing emotions. For example, computational algorithms (such as machine learning techniques) are implemented to extract the user's emotions from the tone of voice data and the context of text. This makes it possible to generate responses that correspond to the user's current emotional state.
[0806] Examples of specific cases and prompt statements
[0807] When a user speaks to the device saying, "I haven't been sleeping well lately," the system categorizes this information data into the "health" category and performs an emotional analysis on the server. If the system determines that the user is experiencing anxiety, the AI will suggest, "Shall we find some ways to relax?" For example, a possible prompt might be something like, "I'm very busy and stressed right now."
[0808] This system aims to provide personalized responses to users through advanced data analysis and emotional response.
[0809] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0810] Step 1:
[0811] The user speaks to an interactive algorithm via the terminal. The terminal recognizes and records this input as voice or text data. Specifically, the terminal does this using a microphone or text input device and temporarily stores the data. The recorded voice or text information is generated as output data.
[0812] Step 2:
[0813] The terminal transmits the collected information data to the server at regular intervals. The terminal uses a network protocol to transfer the data. The input data is communication information recorded by the terminal, while the output data is the raw information data received by the server.
[0814] Step 3:
[0815] The server stores the received information data in a database. This is done using a data management system (e.g., an SQL database). The input is raw information data transmitted over the network, and the output is data that has been organized and stored in the database. Following storage, the server prepares to perform structural analysis of the data.
[0816] Step 4:
[0817] The server analyzes the received data using a natural language processing library. It applies syntactic and semantic analysis to the data and classifies it into predefined categories. The input is stored data in a database, and the output is the classification results for each category. Specifically, it uses language analysis techniques to extract topics from the text.
[0818] Step 5:
[0819] The server uses an emotion analysis module to extract the user's emotions from the data. Voice tone and word choice are analyzed to identify the emotional state. The input is analyzed text data, and emotional information is generated as output. The analysis device uses machine learning algorithms to evaluate the emotions.
[0820] Step 6:
[0821] The server generates and adjusts the response of the interactive algorithm based on the acquired sentiment information. Specifically, it uses a response generation algorithm to create an appropriate response that aligns with the informational data. The input is sentiment information and the user's past history data, and the output is a personalized response to the user.
[0822] Step 7:
[0823] The server sends the response back to the terminal. The terminal presents this response to the user via screen or audio. The input is the response data sent from the server, and the output is the visual or audio response provided to the user. This allows the user to initiate the next interaction based on the presented information.
[0824] (Application Example 2)
[0825] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0826] In modern living environments, there is a growing demand for personalized experiences that respond to users' emotions. Especially for robots and conversational artificial intelligence used in the home, it is desirable not only to provide information but also to understand the user's emotions and use that understanding to control appliances and adjust environmental settings. However, conventional systems have struggled to accurately recognize user emotions and automatically provide appropriate responses. Solving these challenges is therefore crucial.
[0827] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0828] In this invention, the server includes means for collecting information exchanged between the user and conversational artificial intelligence, means for classifying the information into predefined categories using natural language processing technology, and means for recognizing the user's emotions from voice data and text data and controlling home appliances based on those emotions. This makes it possible to provide a personalized experience that corresponds to the user's emotional state.
[0829] A "user" is an individual who communicates with conversational artificial intelligence or an entity that utilizes it.
[0830] "Conversational artificial intelligence" is an artificial system that has the ability to provide information and carry out instructions through dialogue with the user.
[0831] "Information" refers to the collective text or audio data exchanged between the user and the conversational artificial intelligence.
[0832] "Natural language processing technology" is a technology that enables computers to understand and analyze natural human language.
[0833] A "category" refers to a predefined group or type used to classify information.
[0834] "Voice data" refers to digital data representing sound information generated by the user's voice.
[0835] "Text data" refers to digitized textual information obtained through user input or speech recognition.
[0836] "Emotion recognition" is a technology that analyzes and identifies emotions from a user's voice or text.
[0837] "Home appliance control" refers to a function that operates or sets electrical appliances according to user instructions or status.
[0838] A "personalized experience" refers to the provision of services that are customized according to the individual user's circumstances and preferences.
[0839] In the system implementing this invention, a server, a user terminal, and multiple devices within the home work together. The user first initiates a conversation with an interactive artificial intelligence via the terminal. The information from this conversation is recorded as voice or text data and sent to the server at regular intervals. The server converts the voice information into text using a speech recognition module based on TensorFlow, and then uses SpaCy to analyze the text data for natural language processing. Furthermore, OpenAI's emotion analysis API is used for emotion recognition, accurately analyzing the user's emotions from the voice and text data.
[0840] Based on the analysis results, the user's emotional state is determined. For example, if the user expresses fatigue, the server controls lighting and music devices via the smart home platform, providing settings that promote relaxation. This system uses the emotional data stored on the server to generate personalized feedback and displays it in the user interface.
[0841] For example, if a user says to the conversational AI, "I'm very busy and tired today," the server will immediately adjust the lighting to a warmer color and play calming music. This process allows the user to enjoy a more personalized and comfortable experience.
[0842] Example prompt: "When the user wants to relax, have the robot suggest the optimal lighting and music combination."
[0843] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0844] Step 1:
[0845] The user speaks to the conversational artificial intelligence using a device. The input is voice data, which is recorded on the user's device. The device converts the voice signal into a digital format and sends the information to the server.
[0846] Step 2:
[0847] The server uses TensorFlow to perform speech recognition on the received audio data and outputs it as text data. This text data forms the basis for natural language processing. The conversion from audio waveform to string is the data processing in this step.
[0848] Step 3:
[0849] The server processes the text data through SpaCy, a natural language processing tool, to analyze its grammar and structure. Based on the analysis, categories related to the context and subject are determined. This analysis process involves understanding the structure of the input text and assigning categories.
[0850] Step 4:
[0851] The server uses OpenAI's sentiment analysis API to extract the user's emotions from text data. The input is the text data obtained in step 2, and emotion information based on it is output. Different response plans are selected based on the type of emotion (e.g., joy, sadness, stress).
[0852] Step 5:
[0853] Based on the analysis of emotions, the server generates commands to control home appliances via the smart home platform. For example, if relaxation is deemed necessary, it will adjust the lighting and instruct music playback. The input is emotion data, and the output is device control commands.
[0854] Step 6:
[0855] The server generates feedback information for the user based on all the above processing and displays it in the user interface. This feedback reports to the user how the system responded and stores the information for future interactions. The input is the result of past processing, and the output is the feedback message.
[0856] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0857] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0858] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0859] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0860] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0861] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0862] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0863] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0864] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0865] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0866] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0867] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0868] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0869] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0870] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0871] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0872] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0873] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0874] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0875] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0876] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0877] The following is further disclosed regarding the embodiments described above.
[0878] (Claim 1)
[0879] A means of collecting communication data exchanged between users and conversational artificial intelligence,
[0880] A means for classifying the communication data into predefined categories using natural language processing technology,
[0881] A means to enable users to easily find past conversations by displaying the classified data on the user interface,
[0882] A means of analyzing communication data including deadlines, setting reminders, and sending notifications.
[0883] A system that includes this.
[0884] (Claim 2)
[0885] The system according to claim 1, wherein the classified data can be filtered and sorted based on search criteria specified by the user.
[0886] (Claim 3)
[0887] The system according to claim 1, further comprising means for collecting user feedback and utilizing it to improve the natural language processing technology and user interface.
[0888] "Example 1"
[0889] (Claim 1)
[0890] A means of recording information exchanged between the user and the conversational artificial intelligence,
[0891] A means for dividing the aforementioned information into pre-defined classifications using a natural language processing system,
[0892] A means to enable users to easily detect past conversations by displaying the classified information on an information display screen,
[0893] A means of analyzing information including time specifications, setting reminders, and sending notifications.
[0894] A means to understand the context of information with high accuracy using a generative AI model and improve classification accuracy,
[0895] A system that includes this.
[0896] (Claim 2)
[0897] The system according to claim 1, wherein the classified information can be filtered and sorted according to search criteria set by the user.
[0898] (Claim 3)
[0899] The system according to claim 1, further comprising means for collecting user feedback and utilizing it to improve the natural language processing system and the information display screen.
[0900] "Application Example 1"
[0901] (Claim 1)
[0902] A device that collects information exchanged between a user and an interactive artificial intelligence,
[0903] A device that uses natural language processing technology to organize the aforementioned information into a predefined classification,
[0904] A device that displays the classified information on a display device, thereby enabling the user to easily find past conversations,
[0905] A device that analyzes information including deadlines, generates notifications, and sends them out.
[0906] In home automation devices, a device that assists the user in managing their daily life,
[0907] A system that includes this.
[0908] (Claim 2)
[0909] The system according to claim 1, which can organize and sort the classified information based on search criteria specified by the user.
[0910] (Claim 3)
[0911] The system according to claim 1, further comprising a device for collecting user feedback and utilizing it to improve the natural language processing technology and the display device.
[0912] "Example 2 of combining an emotion engine"
[0913] (Claim 1)
[0914] A means of collecting information data exchanged between a user and a computer through an interactive algorithm,
[0915] A means for classifying the aforementioned information data into predefined categories using language analysis technology,
[0916] By presenting the classified data on a display device, a means is provided to allow the user to easily find past conversations.
[0917] A means of analyzing information data, including time limits, and setting up notifications,
[0918] A means of determining emotions using an analytical device and adjusting responses based on this,
[0919] A system that includes this.
[0920] (Claim 2)
[0921] The system according to claim 1, which can sort and select the classified data based on search criteria specified by the user.
[0922] (Claim 3)
[0923] The system according to claim 1, further comprising means for collecting user feedback and utilizing it to improve the language analysis technology and the display device.
[0924] "Application example 2 when combining with an emotional engine"
[0925] (Claim 1)
[0926] A means of collecting information exchanged between the user and the conversational artificial intelligence,
[0927] A means for classifying the information into predefined categories using natural language processing technology,
[0928] A means to enable users to easily find past conversations by displaying the classified information on the user interface,
[0929] A means for recognizing a user's emotions from voice and text data and controlling home appliances based on those emotions,
[0930] A system that includes this.
[0931] (Claim 2)
[0932] The system according to claim 1, wherein the classified information can be filtered and sorted based on search criteria specified by the user.
[0933] (Claim 3)
[0934] The system according to claim 1, further comprising means for collecting user feedback and utilizing it to improve the natural language processing technology, emotion recognition function, and user interface. [Explanation of Symbols]
[0935] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A device that collects information exchanged between a user and an interactive artificial intelligence, A device that uses natural language processing technology to organize the aforementioned information into a predefined classification, A device that displays the classified information on a display device, thereby enabling the user to easily find past conversations, A device that analyzes information including deadlines, generates notifications, and sends them out. In home automation devices, a device that assists the user in managing their daily life, A system that includes this.
2. The system according to claim 1, which can organize and sort the classified information based on search criteria specified by the user.
3. The system according to claim 1, further comprising a device for collecting user feedback and utilizing it to improve the natural language processing technology and the display device.