system

The system addresses the challenges of conventional support systems by uploading and analyzing documents to provide rapid, accurate, and multilingual information, enhancing user satisfaction and reducing support burden.

JP2026101405APending Publication Date: 2026-06-22SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-10
Publication Date
2026-06-22

AI Technical Summary

Technical Problem

Conventional support systems face challenges in quickly providing accurate and consistent information from large documents, lack multilingual support, and struggle with real-time responses, leading to increased customer support burden and decreased user satisfaction.

Method used

A system that uploads documents, extracts textual information, analyzes it to identify important content, generates questions and answers, and supports multilingual capabilities using optical character recognition, enabling efficient and comprehensive information management.

Benefits of technology

Enables rapid and accurate information provision, supports multiple languages, and provides real-time responses, improving user satisfaction and reducing support burden.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026101405000001_ABST
    Figure 2026101405000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Means for acquiring a document, Means for extracting character information from the document, Means for analyzing the character information to identify important information, Means for creating questions and answers based on the important information, Means for adding the generated questions and answers to a knowledge base, Means for providing an answer to an inquiry using the knowledge base, Means for converting the acquired image into character information using optical technology, Means for a consumer automatic machine device to transmit information to a user by voice or on a screen display, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In conventional support systems, it takes time to find the necessary information from a large number of documents and manuals, and there is a problem that the user's inquiries cannot be quickly responded to. For this reason, the burden of customer support increases, which may lead to a decrease in user satisfaction. Furthermore, there are many cases where multi-language support and support for document in image format are not sufficient, and there is a problem that it is difficult to respond in real time while maintaining the consistency and quality of information.

Means for Solving the Problems

[0005] This invention provides a system that enables rapid information provision by uploading documents and extracting textual information from those documents. Specifically, it includes means for analyzing textual information within documents to identify important information, and means for generating questions and answers based on that important information. Furthermore, it adds the generated questions and answers to a knowledge base and uses this knowledge base to provide answers to inquiries, thereby achieving highly accurate real-time support. In addition, by supporting multilingual capabilities and extracting information from images using optical character recognition technology, it enables efficient and comprehensive information management.

[0006] A "document" refers to a file format containing text or images that includes specific information or content.

[0007] "Uploading" refers to the action of a user sending files or information from their local device to a server.

[0008] "Character information" refers to the individual characters and symbols that make up the text contained within a document.

[0009] "Extraction" refers to the process of taking out a specific part of data or information.

[0010] "Analysis" refers to the process of examining information and data in detail, understanding specific patterns and meanings, and drawing conclusions based on that understanding.

[0011] "Important information" refers to elements within the overall information that are particularly valuable and likely to be of interest to users.

[0012] A "question" refers to any doubts or inquiries that a user makes in order to obtain information.

[0013] "Answer" refers to the information or solutions provided in response to a user's question.

[0014] "Generation" refers to the process by which a system creates new data or results.

[0015] "Knowledge base" refers to a database in which specific information and knowledge are aggregated and managed in a searchable form.

[0016] "Query" refers to an act performed by a user to seek specific information or support.

[0017] "Provision" refers to the process of providing specific information or services to a user.

[0018] "Multilingual support" refers to the ability of a system to support multiple languages and process and provide information in various languages.

[0019] "Optical character recognition technology" refers to the technology of identifying characters in an image and converting them into digital text.

Brief Description of Drawings

[0020] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Mode for Carrying Out the Invention

[0021] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0022] First, the terms used in the following description will be explained.

[0023] In the following embodiments, the signed processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0024] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0025] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0026] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0027] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0028] [First Embodiment]

[0029] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0030] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0031] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0032] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0033] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0034] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0035] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0036] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0037] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0038] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0039] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0040] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0041] This invention begins with a user uploading a document file, including manuals and instructions, from their device to a server. The server then analyzes the received document file. If the uploaded file is in image format, the server uses optical character recognition technology to extract text information from the image and convert it into digital text.

[0042] Next, the server analyzes the extracted text information based on a generating AI model to identify important information within the document. This analysis predicts questions that users are likely to ask and automatically generates corresponding answers. The generated question-and-answer pairs are added to the knowledge base, making the newly added information available in real time.

[0043] When a user submits an inquiry, the question entered on the device is sent to the server. The server searches the knowledge base and identifies the relevant answer. The identified answer is then provided to the user through the device. This allows the user to receive quick and accurate support.

[0044] As a concrete example, consider a scenario where a user uploads a manual for a new home appliance. The server automatically extracts important information, including product setup instructions and troubleshooting guides, and generates answers to frequently asked questions, such as "What to do if the power won't turn on." When a user sends such a question to the chatbot, the server can quickly provide the appropriate solution from the knowledge base, assisting the user. This system makes it possible to simplify user operation while achieving efficient support.

[0045] The following describes the processing flow.

[0046] Step 1:

[0047] Users select manuals and document files via their terminal and upload them to the server using the system interface.

[0048] Step 2:

[0049] The terminal sends the selected document file to the server and notifies the user when the upload is complete.

[0050] Step 3:

[0051] The server receives the document file sent from the terminal and saves the file to temporary storage for analysis.

[0052] Step 4:

[0053] The server checks the file format, and if it's an image file, it uses optical character recognition technology to extract the text information within the image as digital text.

[0054] Step 5:

[0055] The server analyzes the extracted text using a generative AI model and performs natural language processing to identify important information.

[0056] Step 6:

[0057] The server predicts questions that the user is likely to ask frequently based on the text and generates appropriate answers to those questions.

[0058] Step 7:

[0059] The server adds the generated question-and-answer pairs to the knowledge base and keeps it up-to-date to prepare for real-time queries.

[0060] Step 8:

[0061] When a user requests information, the device sends the question to the server.

[0062] Step 9:

[0063] The server searches its knowledge base based on the question it receives and identifies the best answer.

[0064] Step 10:

[0065] The server sends the identified answer to the terminal, and the terminal displays it to the user, thereby providing the necessary information.

[0066] (Example 1)

[0067] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0068] It is essential to quickly and effectively utilize the information users possess and efficiently provide them with the necessary information. Traditional methods have made it difficult to quickly extract information from documents and provide appropriate answers to user inquiries in real time. It is necessary to solve these problems and improve user convenience and speed up information access.

[0069] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0070] In this invention, the server includes means for transferring a document to an information processing device via a communication device, means for determining whether the document is in image format and extracting character information from the image using optical character recognition technology, and means for analyzing the character information based on a generating AI model and identifying important information. This makes it possible to quickly provide the information that the user needs.

[0071] "Communication equipment" is a general term for devices used to transmit documents from a user's terminal to an information processing device.

[0072] An "information processing device" is a general term for a device that analyzes received documents and performs necessary data extraction and processing.

[0073] "Optical character recognition technology" is a general term for technologies that extract character information as digital text from image-based documents.

[0074] A "generative AI model" is a general term for artificial intelligence technologies that analyze given text information and extract specific patterns or information.

[0075] A "collection" is a general term for a database that stores generated question-and-answer pairs and uses them for subsequent searches and queries.

[0076] A "prompt message" is a general term for text containing instructions or questions that a user enters into a system.

[0077] This system primarily utilizes user terminals, servers, and communication devices. Users first upload document files to the server via the communication device using their terminal. The system supports document file formats such as PDF, JPEG, and PNG.

[0078] If the received document is in image format, the server extracts the text information using optical character recognition (OCR) technology. This process utilizes Tesseract OCR or other similar optical character recognition software. The extracted text information is then input into a generative AI model. Typical generative AI models include those incorporating natural language processing technology.

[0079] The generative AI model analyzes the input text information and extracts key information from the document. This analysis identifies questions that users are likely to ask and generates appropriate answers as countermeasures. These question-and-answer pairs are stored in a knowledge base on the server and managed in a constantly accessible state.

[0080] A concrete example is when a user uploads a manual for a new home appliance. In this case, the server can analyze the operating instructions and troubleshooting guide for the product described in the manual and automatically generate answers to frequently asked questions, such as "What to do if the power won't turn on."

[0081] Furthermore, when a user enters a prompt message from their terminal such as "Tell me how to set up a new home appliance," the server searches its knowledge base. As a result, relevant information is quickly identified and provided to the user through the terminal. This process allows users to efficiently obtain information and use it to solve problems. This system is particularly characterized by its intuitive operation and high level of convenience, making it beneficial to many users.

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] The user uses a terminal to select document files such as manuals and instruction sheets and uploads them to the server via a communication device. The input is the document file itself, in formats such as PDF or image files. The output is the document file stored on the server.

[0085] Step 2:

[0086] The server determines whether the received document file is in image format. If it is, it uses optical character recognition (OCR) technology to extract text information from the image. The input to this process is an image document file, and the output is text data. Specifically, software such as Tesseract OCR is used to convert the text into digital text.

[0087] Step 3:

[0088] The server inputs text data into a generating AI model, which then analyzes the important information. The input is extracted text, and the output is the analysis result containing the important information. This process utilizes natural language processing techniques to extract key points and questions that users are likely to ask within the document.

[0089] Step 4:

[0090] The server predicts questions that users are likely to frequently ask based on the analysis results and generates answers to them. At this stage, the input is the analysis results containing important information, and the output is question-and-answer pairs. The generated content is added to the knowledge base to prepare for future inquiries.

[0091] Step 5:

[0092] The user enters a prompt message from their terminal. A specific example might be a question like, "Please tell me how to set up a new home appliance." Based on this input, the server searches its knowledge base and identifies the relevant answer.

[0093] Step 6:

[0094] The server sends the search results to the terminal. The input consists of the user's prompt and the search results based on it, while the output is the answer displayed on the terminal. This allows the user to quickly and accurately obtain the information they are looking for.

[0095] (Application Example 1)

[0096] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0097] Home automated machinery and devices are required to provide users with real-time, efficient, and accurate information on how to use products and solutions to problems that may arise. However, existing systems struggle to quickly extract necessary information from documents and respond to user inquiries immediately. As a result, users often receive inefficient support when using products. A solution to this problem is needed.

[0098] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0099] In this invention, the server includes means for acquiring documents, means for extracting textual information from the documents, and means for analyzing the textual information to identify important information. This enables a home automated machine to provide users with useful and accurate product support information in real time.

[0100] "Means of obtaining documents" refers to the function of sending or uploading documents to a server from a user's device.

[0101] "Means for extracting textual information" refers to the process of extracting textual data from acquired documents.

[0102] "Means of analyzing textual information to identify important information" refers to the ability to analyze extracted textual data and select information that is useful to the user from it.

[0103] "Means for generating questions and answers" refers to a function that automatically generates appropriate answers to potential user questions based on identified key information.

[0104] "Means of adding to the knowledge base" refers to the process of registering the generated question-and-answer pairs in a database for storage.

[0105] "Means of converting into text information using optical technology" refers to technology that optically analyzes characters within an image and converts them into digital text.

[0106] "Means by which consumer automated machinery transmits information to users via voice or screen display" refers to functions such as those used by household robots to provide information to users using voice synthesis or display.

[0107] The system for carrying out this invention involves the collaborative operation of a home-use automated device and a server. The server retrieves documents sent by the user through a terminal and, if the document is in image format, extracts text data using optical character recognition (OCR) technology. The OCR software used in this process is widely known, such as Tesseract.

[0108] The server then inputs the extracted text data into a generating AI model (e.g., the GPT series) to identify important information. This analysis automatically generates questions that users are likely to ask and their corresponding answers. The generated question-and-answer pairs are added to the knowledge base on the server, enabling real-time inquiry handling.

[0109] A consumer-grade automated machine (e.g., a typical home robot) installed in a user's home receives information from a server and provides it to the user using speech synthesis and display functions. This information provision begins when the user asks the robot a question in natural language. The server analyzes the question, searches its knowledge base for the corresponding answer, and immediately responds to the user.

[0110] As a concrete example, consider a scenario where a user purchases a new home appliance and shows its instruction manual to a robot. When the user asks the robot, "Tell me how to use this product," the robot will provide specific instructions such as, "First, turn on the power. Next, press the settings button."

[0111] An example of a prompt message could be natural language input such as, "Please analyze the instruction manual for this new home appliance and provide general troubleshooting steps."

[0112] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0113] Step 1:

[0114] The user uploads a document from their device to the server. The input is the document file provided by the user, which the server receives. The output is the document file stored on the server.

[0115] Step 2:

[0116] The server analyzes uploaded documents and uses optical character recognition (OCR) technology if the document is in image format. The input is an image file, and the server extracts character information from the image data using OCR software such as Tesseract. The output is digital text data.

[0117] Step 3:

[0118] The server inputs the extracted text information into a generative AI model, which then analyzes the information. The input is digital text data, and language analysis is performed by a generative AI model (e.g., GPT series). This identifies important information within the document. The output is the identified important information.

[0119] Step 4:

[0120] The server automatically generates questions and corresponding answers that users are likely to ask, based on the identified key information. The input is the key information obtained in step 3, and the generation AI model is used to create question-and-answer pairs. The output is these pairs.

[0121] Step 5:

[0122] The server adds the generated question-and-answer pairs to the knowledge base. The input is the generated question-and-answer pairs, which the server registers in the database. The output is the updated knowledge base.

[0123] Step 6:

[0124] A user asks a home robot questions in natural language. The input is the user's voice or text questions, which the robot receives. The output is the received question data.

[0125] Step 7:

[0126] The server analyzes user question data and searches its knowledge base to identify answers. The input is the received question data, and the server uses a generative AI model for search and matching. The output is the identified answer.

[0127] Step 8:

[0128] The server sends the identified response to a home robot, which then provides the information to the user via speech synthesis or a display. The input is the response data provided by the server, and the output is the provision of information to the user in either audio or visual form.

[0129] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0130] This invention combines a system that uploads documents, extracts important information from those documents, and generates corresponding Q&A with an emotion engine that recognizes user emotions. Users upload document files to a server via their terminal. The server extracts textual information from the received document files and analyzes it using an AI model. The analysis identifies important information, and questions and answers are generated based on this information. These question-and-answer pairs are added to a knowledge base for real-time inquiry handling.

[0131] Furthermore, the present invention incorporates an emotion engine to recognize the emotional state of the user in response to their inquiries. The emotion engine analyzes the user's emotions from their voice or text input and takes this emotional information into account when matching it with the knowledge base on the server. As a result, the most appropriate response is selected according to the user's emotions and delivered to the user in an appropriate tone.

[0132] As a concrete example, consider a scenario where a user uploads a manual for setting up a complex product. The server automatically extracts key information, including the product setup procedure, and generates questions such as "What are the main causes when the product doesn't work correctly?" Furthermore, if the emotion engine detects that the user is expressing frustration or stress, the server provides answers with more polite language and additional explanations. This allows users to receive more satisfying support, improving the accuracy of problem solving and overall satisfaction.

[0133] The following describes the processing flow.

[0134] Step 1:

[0135] Users select manuals and document files using a terminal and upload them to the server via the system interface.

[0136] Step 2:

[0137] The terminal sends the selected document file to the server and displays a notification to the user confirming that the upload is complete.

[0138] Step 3:

[0139] The server receives documents sent from the terminal and temporarily stores them for analysis of their contents.

[0140] Step 4:

[0141] The server checks the format of the uploaded file, and if it is an image file, it uses optical character recognition technology to extract the character information and converts it into a text file.

[0142] Step 5:

[0143] The server activates a generative AI model and identifies important information by analyzing the extracted textual data.

[0144] Step 6:

[0145] Based on identified key information, the server predicts questions that users are likely to ask and generates answers to them.

[0146] Step 7:

[0147] The generated questions and answers are structured and added to the knowledge base, preparing it for real-time user inquiries.

[0148] Step 8:

[0149] When a user makes an inquiry, the device sends the question to the server as text or voice.

[0150] Step 9:

[0151] The emotion engine on the server analyzes the user's emotions from their text or voice and extracts emotional information.

[0152] Step 10:

[0153] The server searches the knowledge base based on sentiment information and selects the most appropriate answer in the right tone.

[0154] Step 11:

[0155] The server sends the selected answer to the terminal, which then displays it to the user and provides appropriate support.

[0156] (Example 2)

[0157] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0158] Currently, a vast amount of information is stored electronically as massive documents, making it difficult to instantly extract necessary information from these and provide it to users in an appropriate format. Furthermore, understanding and responding appropriately to user emotions during inquiries is required, but achieving this is also challenging. It is necessary to resolve these problems and realize information provision and emotionally sensitive responses to users.

[0159] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0160] In this invention, the server includes means for uploading a document, means for extracting textual information from the document, means for analyzing the textual information to identify important information, means for adding generated questions and answers to an information resource, means for recognizing the emotional state of the user input, and means for providing the optimal response according to the emotional state. This makes it possible to immediately extract and provide necessary information from a document and to respond in a way that takes the user's emotions into consideration.

[0161] A "document" refers to a medium on which information is recorded using characters or symbols, and includes both electronic and physical forms.

[0162] "The means of uploading" refers to the process by which a user transfers a document to an electronic system and saves it to a server.

[0163] "Textual information" refers to data represented by characters, numbers, and symbols contained in a document.

[0164] "Important information" refers to key data and knowledge extracted through document analysis that are likely to be of interest to users.

[0165] "Questions and Answers" refers to a question format and its response that is generated based on important information and structured to help the user understand.

[0166] "Information resources" refer to databases where generated question-and-answer pairs are stored and managed.

[0167] "Means of providing answers to inquiries" refers to the process of using information resources to return appropriate information in response to questions from users.

[0168] "Means of recognizing emotional states" refers to technologies that analyze user input, such as voice or text, to identify emotions.

[0169] "Means of providing the optimal answer" refers to the process of selecting and presenting the most appropriate information from information resources in response to the recognized emotional state of the user.

[0170] This invention begins with a user uploading a document to a server using a terminal. The terminal can access a dedicated application via a web browser and send a document file to the server using the "Select File" button. The server receives the document file and extracts the character information using optical character recognition technology (OCR software example: Tesseract).

[0171] The extracted textual information is analyzed by a generative AI model within the server, specifically a natural language processing (NLP) model (e.g., BERT, GPT-3®). This analysis identifies key information from the document. Based on this key information, the server generates questions and answers and adds these pairs to the information resource in real time. This process strengthens the question-and-answer knowledge base.

[0172] When a user makes an inquiry, the server uses an emotion engine to recognize the user's emotions from their input (in voice or text format). This emotion analysis uses natural language processing tools (e.g., IBM Watson® NLU). Taking the emotion information into account, the server selects the most appropriate answer from its information resources and responds in a tone that matches the user's emotional state.

[0173] As a concrete example, consider a scenario where a user uploads an instruction manual for a complex piece of equipment. The server extracts key procedures and troubleshooting information from the manual and generates questions such as, "What causes the equipment to not work?" Furthermore, if the sentiment engine detects that the user is dissatisfied, the server provides a more detailed and considerate response.

[0174] Examples of prompts include, "What important information is included in this manual? Generate a Q&A based on it," and "How should you respond if a user expresses dissatisfaction? Use the sentiment engine to provide an appropriate response."

[0175] In this way, users can receive timely and accurate information from the server, along with emotionally sensitive responses, thereby improving their content usage experience.

[0176] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0177] Step 1:

[0178] The user uploads a document file to the server using their device. The input is the document file selected by the user, and the output is the document file saved on the server. The user opens a web browser, uses the upload form, clicks the "Select File" button, selects the desired file, and submits it.

[0179] Step 2:

[0180] The server extracts text information from the uploaded document file. The input is the document file received in step 1, and the output is the extracted text information. The server uses OCR software to recognize text information from image and PDF files and obtains text data.

[0181] Step 3:

[0182] The server uses a generative AI model to analyze textual information and identify important information. The input is the textual information extracted in step 2, and the output is the identified important information. The server uses an NLP model (e.g., BERT, GPT-3) to analyze the document content and extract keywords and key data points.

[0183] Step 4:

[0184] The server generates question-and-answer pairs based on the identified key information. The input is the key information identified in step 3, and the output is the generated question-and-answer pairs. The server leverages the generative AI model to create a Q&A that includes answers to potential user inquiries.

[0185] Step 5:

[0186] The server adds the generated question-and-answer pairs to the information resource. The input is the question-and-answer pairs generated in step 4, and the output is the updated information resource. This enhances the knowledge base in real time.

[0187] Step 6:

[0188] The user makes a query, and the server uses an emotion engine to recognize the user's emotional state. The input is the user's voice or text input, and the output is data about the emotional state. The server uses analysis tools to evaluate the emotion and extract emotional information.

[0189] Step 7:

[0190] The server provides the optimal response based on the emotional state. The input consists of the emotional state recognized in step 6 and candidate responses from the information resources, while the output is the adjusted response presented to the user. The tone of the response is adjusted based on the emotional state to improve user satisfaction.

[0191] (Application Example 2)

[0192] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0193] In addition to extracting important information from documents and generating effective inquiry responses, there is a need to improve the user experience by providing appropriate responses that respond to the user's emotions. However, conventional systems often fail to adequately consider user emotions and can only provide fixed responses. Therefore, a system capable of flexible and personalized responses based on user emotions is necessary.

[0194] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0195] In this invention, the server includes means for uploading documents, means for extracting information, means for analyzing information to identify important data, means for generating queries and responses, means for adding the generated queries and responses to a knowledge base, means for providing the optimal response using the knowledge base, means for analyzing emotions, and means for selecting and providing the optimal response based on emotions. This makes it possible to provide the optimal answer according to the user's emotions.

[0196] "Methods for uploading documents" refer to functions that allow users to transfer text and image files to the system.

[0197] "Means for extracting information" refers to functions for extracting important text data from uploaded documents and images.

[0198] "Means of analyzing information to identify important data" refers to functions that process extracted information and identify highly relevant data.

[0199] "Means for generating inquiries and responses" refers to a function that creates questions and answers to user inquiries based on identified key data.

[0200] "Means of adding generated queries and responses to a knowledge base" refers to a function that saves the created question and answer pairs to a database for later use.

[0201] "A means of providing the optimal response using a knowledge base" refers to a function that refers to stored data and presents the most appropriate answer to a user's question.

[0202] "Means of analyzing emotions" refers to functions that evaluate emotions from the user's text or voice input.

[0203] "A means of selecting and providing the optimal response based on emotions" refers to a function that takes into account the user's emotional state and determines and provides a response with an appropriate tone and content.

[0204] This invention provides a system that enables a process in which a user uploads a document via a terminal, extracts important information from that document, and generates queries and responses within the system. The server identifies important data through document upload, information extraction, and information analysis, and adds the generated queries and responses to a knowledge base. When providing responses to user queries, the system is designed to analyze sentiment and select and provide the most appropriate response based on that sentiment.

[0205] The hardware used includes smartphones and personal computers, which users use to upload documents. The software utilizes Tesseract OCR for character recognition and Transformer-based models (such as BERT) for sentiment analysis. Data processing and computation involve the software extracting text information from image data, analyzing the extracted text to identify important information, and then using generative AI models to generate appropriate questions and answers.

[0206] As a concrete example, suppose a user takes a picture of an invoice using their smartphone and uploads it to a server. From this picture, the server uses character recognition technology to extract the invoice amount and payment deadline, and analyzes this information to generate inquiries and answers that the user might be asking. At the same time, sentiment analysis is performed based on the user's text or voice input, and the tone is adjusted according to the user's emotions. The prompt input used to generate the AI ​​model is, "Please give me advice about this expense. I am worried."

[0207] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0208] Step 1:

[0209] The user uploads a document or image to the server using their device. This input is an image of an invoice or receipt. The server receives this input and prepares for the next processing step.

[0210] Step 2:

[0211] The server extracts text information from the received image data. This process uses Tesseract OCR, a character recognition technology. It receives image data as input and generates text data as output.

[0212] Step 3:

[0213] The server analyzes the extracted text data to identify important information. This step uses an analysis algorithm to extract information important to the user (e.g., billing amount, payment deadline). The input is text data, and the output is a dataset containing the important information.

[0214] Step 4:

[0215] Based on identified key information, the server automatically generates queries and responses using a generative AI model. This process takes key information as input and uses the generative AI model to output appropriate question-and-answer pairs.

[0216] Step 5:

[0217] The user enters a prompt message into the server via text or voice through their terminal. This prompt message may include the user's emotions, for example, "Please give me advice about this expense. I am worried." The server receives this prompt message.

[0218] Step 6:

[0219] The server analyzes the user's emotions from the received prompt text. This process uses a Transformer-based emotion analysis model. The input is the prompt text, and the output is data indicating the user's emotional state.

[0220] Step 7:

[0221] The server selects and provides the optimal response to the user based on their emotions, referencing a knowledge base. This step uses emotion data and data from the knowledge base as input and outputs a response with an adjusted tone.

[0222] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0223] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0224] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0225] [Second Embodiment]

[0226] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0227] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0228] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0229] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0230] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0231] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0232] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0233] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0234] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0235] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0236] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0237] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0238] This invention begins with a user uploading a document file, including manuals and instructions, from their device to a server. The server then analyzes the received document file. If the uploaded file is in image format, the server uses optical character recognition technology to extract text information from the image and convert it into digital text.

[0239] Next, the server analyzes the extracted text information based on a generating AI model to identify important information within the document. This analysis predicts questions that users are likely to ask and automatically generates corresponding answers. The generated question-and-answer pairs are added to the knowledge base, making the newly added information available in real time.

[0240] When a user submits an inquiry, the question entered on the device is sent to the server. The server searches the knowledge base and identifies the relevant answer. The identified answer is then provided to the user through the device. This allows the user to receive quick and accurate support.

[0241] As a concrete example, consider a scenario where a user uploads a manual for a new home appliance. The server automatically extracts important information, including product setup instructions and troubleshooting guides, and generates answers to frequently asked questions, such as "What to do if the power won't turn on." When a user sends such a question to the chatbot, the server can quickly provide the appropriate solution from the knowledge base, assisting the user. This system makes it possible to simplify user operation while achieving efficient support.

[0242] The following describes the processing flow.

[0243] Step 1:

[0244] Users select manuals and document files via their terminal and upload them to the server using the system interface.

[0245] Step 2:

[0246] The terminal sends the selected document file to the server and notifies the user when the upload is complete.

[0247] Step 3:

[0248] The server receives the document file sent from the terminal and saves the file to temporary storage for analysis.

[0249] Step 4:

[0250] The server checks the file format, and if it's an image file, it uses optical character recognition technology to extract the text information within the image as digital text.

[0251] Step 5:

[0252] The server analyzes the extracted text using a generative AI model and performs natural language processing to identify important information.

[0253] Step 6:

[0254] The server predicts questions that the user is likely to ask frequently based on the text and generates appropriate answers to those questions.

[0255] Step 7:

[0256] The server adds the generated question-and-answer pairs to the knowledge base and keeps it up-to-date to prepare for real-time queries.

[0257] Step 8:

[0258] When a user requests information, the device sends the question to the server.

[0259] Step 9:

[0260] The server searches its knowledge base based on the question it receives and identifies the best answer.

[0261] Step 10:

[0262] The server sends the identified answer to the terminal, and the terminal displays it to the user, thereby providing the necessary information.

[0263] (Example 1)

[0264] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0265] It is essential to quickly and effectively utilize the information users possess and efficiently provide them with the necessary information. Traditional methods have made it difficult to quickly extract information from documents and provide appropriate answers to user inquiries in real time. It is necessary to solve these problems and improve user convenience and speed up information access.

[0266] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0267] In this invention, the server includes means for transferring a document to an information processing device via a communication device, means for determining whether the document is in image format and extracting character information from the image using optical character recognition technology, and means for analyzing the character information based on a generating AI model and identifying important information. This makes it possible to quickly provide the information that the user needs.

[0268] "Communication equipment" is a general term for devices used to transmit documents from a user's terminal to an information processing device.

[0269] An "information processing device" is a general term for a device that analyzes received documents and performs necessary data extraction and processing.

[0270] "Optical character recognition technology" is a general term for technologies that extract character information as digital text from image-based documents.

[0271] A "generative AI model" is a general term for artificial intelligence technologies that analyze given text information and extract specific patterns or information.

[0272] A "collection" is a general term for a database that stores generated question-and-answer pairs and uses them for subsequent searches and queries.

[0273] A "prompt message" is a general term for text containing instructions or questions that a user enters into a system.

[0274] This system primarily utilizes user terminals, servers, and communication devices. Users first upload document files to the server via the communication device using their terminal. The system supports document file formats such as PDF, JPEG, and PNG.

[0275] If the received document is in image format, the server extracts the text information using optical character recognition (OCR) technology. This process utilizes Tesseract OCR or other similar optical character recognition software. The extracted text information is then input into a generative AI model. Typical generative AI models include those incorporating natural language processing technology.

[0276] The generative AI model analyzes the input text information and extracts key information from the document. This analysis identifies questions that users are likely to ask and generates appropriate answers as countermeasures. These question-and-answer pairs are stored in a knowledge base on the server and managed in a constantly accessible state.

[0277] A concrete example is when a user uploads a manual for a new home appliance. In this case, the server can analyze the operating instructions and troubleshooting guide for the product described in the manual and automatically generate answers to frequently asked questions, such as "What to do if the power won't turn on."

[0278] Also, when the user inputs a prompt sentence such as "Teach me the setup procedure for a new household appliance" from the terminal, the server searches the knowledge base. As a result, relevant information is quickly identified and provided to the user through the terminal. Through this process, the user can efficiently obtain information and utilize it for problem-solving. This system is characterized by its intuitive operation and high convenience, and is beneficial to many users.

[0279] The flow of the specific process in Example 1 will be described using FIG. 11.

[0280] Step 1:

[0281] The user uses the terminal to select a document file such as a manual or an instruction manual, and uploads it to the server via the communication device. At this time, the input is the document file itself, and the format is a PDF, an image file, etc. The output is the document file saved on the server.

[0282] Step 2:

[0283] The server determines whether the received document file is in image format. If it is in image format, optical character recognition technology is used to extract character information from the image. The input of this process is the image document file, and the output is text data. Specifically, software such as Tesseract OCR is used to convert the characters into digital text.

[0284] Step 3:

[0285] The server inputs the text data into the generative AI model to analyze the important information. The input is the extracted text, and the output is the analysis result including the important information. In this process, natural language processing technology is utilized to extract the important points in the document and the questions that the user is likely to ask frequently.

[0286] Step 4:

[0287] The server predicts questions that users are likely to frequently ask based on the analysis results and generates answers to them. At this stage, the input is the analysis results containing important information, and the output is question-and-answer pairs. The generated content is added to the knowledge base to prepare for future inquiries.

[0288] Step 5:

[0289] The user enters a prompt message from their terminal. A specific example might be a question like, "Please tell me how to set up a new home appliance." Based on this input, the server searches its knowledge base and identifies the relevant answer.

[0290] Step 6:

[0291] The server sends the search results to the terminal. The input consists of the user's prompt and the search results based on it, while the output is the answer displayed on the terminal. This allows the user to quickly and accurately obtain the information they are looking for.

[0292] (Application Example 1)

[0293] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0294] Home automated machinery and devices are required to provide users with real-time, efficient, and accurate information on how to use products and solutions to problems that may arise. However, existing systems struggle to quickly extract necessary information from documents and respond to user inquiries immediately. As a result, users often receive inefficient support when using products. A solution to this problem is needed.

[0295] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0296] In this invention, the server includes means for acquiring documents, means for extracting textual information from the documents, and means for analyzing the textual information to identify important information. This enables a home automated machine to provide users with useful and accurate product support information in real time.

[0297] "Means of obtaining documents" refers to the function of sending or uploading documents to a server from a user's device.

[0298] "Means for extracting textual information" refers to the process of extracting textual data from acquired documents.

[0299] "Means of analyzing textual information to identify important information" refers to the ability to analyze extracted textual data and select information that is useful to the user from it.

[0300] "Means for generating questions and answers" refers to a function that automatically generates appropriate answers to potential user questions based on identified key information.

[0301] "Means of adding to the knowledge base" refers to the process of registering the generated question-and-answer pairs in a database for storage.

[0302] "Means of converting into text information using optical technology" refers to technology that optically analyzes characters within an image and converts them into digital text.

[0303] "Means by which consumer automated machinery transmits information to users via voice or screen display" refers to functions such as those used by household robots to provide information to users using voice synthesis or display.

[0304] The system for implementing this invention functions through the cooperation of household automated devices and a server. The server acquires the documents transmitted by the user through a terminal. When the document is in image format, it extracts text data using optical character recognition technology. At this time, the optical character recognition software used is widely known, such as Tesseract.

[0305] After that, the server inputs the extracted text data into a generative AI model (e.g., GPT series) to identify important information. Through this analysis, questions that the user is likely to ask and corresponding answers are automatically generated. The generated question-and-answer pairs are added to the knowledge base on the server, enabling real-time inquiry response.

[0306] Consumer automatic devices (e.g., general household robots) installed in the user's home receive information from the server and provide information to the user using voice synthesis and display functions. This information provision is initiated when the user asks a question in natural language to the robot. The server analyzes the question, searches for a corresponding answer from the knowledge base, and immediately replies to the user.

[0307] As a specific example, consider the case where the user purchases a new household appliance and shows its manual to the robot. When the user asks the robot "Please teach me how to use this product", the robot can teach specific procedures in the form of "First, turn on the power. Next, press the setting button."

[0308] As an example of a prompt sentence, a natural language input such as "Analyze the manual of this new household appliance and teach me the general troubleshooting procedures." can be considered.

[0309] The flow of specific processing in Application Example 1 will be described using FIG. 12.

[0310] Step 1:

[0311] The user uploads a document from their device to the server. The input is the document file provided by the user, which the server receives. The output is the document file stored on the server.

[0312] Step 2:

[0313] The server analyzes uploaded documents and uses optical character recognition (OCR) technology if the document is in image format. The input is an image file, and the server extracts character information from the image data using OCR software such as Tesseract. The output is digital text data.

[0314] Step 3:

[0315] The server inputs the extracted text information into a generative AI model, which then analyzes the information. The input is digital text data, and language analysis is performed by a generative AI model (e.g., GPT series). This identifies important information within the document. The output is the identified important information.

[0316] Step 4:

[0317] The server automatically generates questions and corresponding answers that users are likely to ask, based on the identified key information. The input is the key information obtained in step 3, and the generation AI model is used to create question-and-answer pairs. The output is these pairs.

[0318] Step 5:

[0319] The server adds the generated question-and-answer pairs to the knowledge base. The input is the generated question-and-answer pairs, which the server registers in the database. The output is the updated knowledge base.

[0320] Step 6:

[0321] A user asks a home robot questions in natural language. The input is the user's voice or text questions, which the robot receives. The output is the received question data.

[0322] Step 7:

[0323] The server analyzes user question data and searches its knowledge base to identify answers. The input is the received question data, and the server uses a generative AI model for search and matching. The output is the identified answer.

[0324] Step 8:

[0325] The server sends the identified response to a home robot, which then provides the information to the user via speech synthesis or a display. The input is the response data provided by the server, and the output is the provision of information to the user in either audio or visual form.

[0326] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0327] This invention combines a system that uploads documents, extracts important information from those documents, and generates corresponding Q&A with an emotion engine that recognizes user emotions. Users upload document files to a server via their terminal. The server extracts textual information from the received document files and analyzes it using an AI model. The analysis identifies important information, and questions and answers are generated based on this information. These question-and-answer pairs are added to a knowledge base for real-time inquiry handling.

[0328] Furthermore, the present invention incorporates an emotion engine to recognize the emotional state of the user in response to their inquiries. The emotion engine analyzes the user's emotions from their voice or text input and takes this emotional information into account when matching it with the knowledge base on the server. As a result, the most appropriate response is selected according to the user's emotions and delivered to the user in an appropriate tone.

[0329] As a concrete example, consider a scenario where a user uploads a manual for setting up a complex product. The server automatically extracts key information, including the product setup procedure, and generates questions such as "What are the main causes when the product doesn't work correctly?" Furthermore, if the emotion engine detects that the user is expressing frustration or stress, the server provides answers with more polite language and additional explanations. This allows users to receive more satisfying support, improving the accuracy of problem solving and overall satisfaction.

[0330] The following describes the processing flow.

[0331] Step 1:

[0332] Users select manuals and document files using a terminal and upload them to the server via the system interface.

[0333] Step 2:

[0334] The terminal sends the selected document file to the server and displays a notification to the user confirming that the upload is complete.

[0335] Step 3:

[0336] The server receives documents sent from the terminal and temporarily stores them for analysis of their contents.

[0337] Step 4:

[0338] The server checks the format of the uploaded file, and if it is an image file, it uses optical character recognition technology to extract the character information and converts it into a text file.

[0339] Step 5:

[0340] The server activates a generative AI model and identifies important information by analyzing the extracted textual data.

[0341] Step 6:

[0342] Based on identified key information, the server predicts questions that users are likely to ask and generates answers to them.

[0343] Step 7:

[0344] The generated questions and answers are structured and added to the knowledge base, preparing it for real-time user inquiries.

[0345] Step 8:

[0346] When a user makes an inquiry, the device sends the question to the server as text or voice.

[0347] Step 9:

[0348] The emotion engine on the server analyzes the user's emotions from their text or voice and extracts emotional information.

[0349] Step 10:

[0350] The server searches the knowledge base based on sentiment information and selects the most appropriate answer in the right tone.

[0351] Step 11:

[0352] The server sends the selected answer to the terminal, which then displays it to the user and provides appropriate support.

[0353] (Example 2)

[0354] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0355] Currently, a vast amount of information is stored electronically as massive documents, making it difficult to instantly extract necessary information from these and provide it to users in an appropriate format. Furthermore, understanding and responding appropriately to user emotions during inquiries is required, but achieving this is also challenging. It is necessary to resolve these problems and realize information provision and emotionally sensitive responses to users.

[0356] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0357] In this invention, the server includes means for uploading a document, means for extracting textual information from the document, means for analyzing the textual information to identify important information, means for adding generated questions and answers to an information resource, means for recognizing the emotional state of the user input, and means for providing the optimal response according to the emotional state. This makes it possible to immediately extract and provide necessary information from a document and to respond in a way that takes the user's emotions into consideration.

[0358] A "document" refers to a medium on which information is recorded using characters or symbols, and includes both electronic and physical forms.

[0359] "The means of uploading" refers to the process by which a user transfers a document to an electronic system and saves it to a server.

[0360] "Textual information" refers to data represented by characters, numbers, and symbols contained in a document.

[0361] "Important information" refers to key data and knowledge extracted through document analysis that are likely to be of interest to users.

[0362] "Questions and Answers" refers to a question format and its response that is generated based on important information and structured to help the user understand.

[0363] "Information resources" refer to databases where generated question-and-answer pairs are stored and managed.

[0364] "Means of providing answers to inquiries" refers to the process of using information resources to return appropriate information in response to questions from users.

[0365] "Means of recognizing emotional states" refers to technologies that analyze user input, such as voice or text, to identify emotions.

[0366] "Means of providing the optimal answer" refers to the process of selecting and presenting the most appropriate information from information resources in response to the recognized emotional state of the user.

[0367] This invention begins with a user uploading a document to a server using a terminal. The terminal can access a dedicated application via a web browser and send a document file to the server using the "Select File" button. The server receives the document file and extracts the character information using optical character recognition technology (OCR software example: Tesseract).

[0368] The extracted textual information is analyzed by a generative AI model within the server, specifically a natural language processing (NLP) model (e.g., BERT, GPT-3). This analysis identifies key information from the document. Based on this key information, the server generates questions and answers and adds these pairs to the information resource in real time. This process strengthens the question-and-answer knowledge base.

[0369] When a user makes an inquiry, the server uses an emotion engine to recognize the user's emotions from their input (voice or text). This emotion analysis uses natural language processing tools (e.g., IBM Watson NLU). Taking the emotion information into account, the server selects the most appropriate answer from its information resources and responds in a tone that matches the user's emotional state.

[0370] As a concrete example, consider a scenario where a user uploads an instruction manual for a complex piece of equipment. The server extracts key procedures and troubleshooting information from the manual and generates questions such as, "What causes the equipment to not work?" Furthermore, if the sentiment engine detects that the user is dissatisfied, the server provides a more detailed and considerate response.

[0371] Examples of prompts include, "What important information is included in this manual? Generate a Q&A based on it," and "How should you respond if a user expresses dissatisfaction? Use the sentiment engine to provide an appropriate response."

[0372] In this way, users can receive timely and accurate information from the server, along with emotionally sensitive responses, thereby improving their content usage experience.

[0373] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0374] Step 1:

[0375] The user uploads a document file to the server using their device. The input is the document file selected by the user, and the output is the document file saved on the server. The user opens a web browser, uses the upload form, clicks the "Select File" button, selects the desired file, and submits it.

[0376] Step 2:

[0377] The server extracts text information from the uploaded document file. The input is the document file received in step 1, and the output is the extracted text information. The server uses OCR software to recognize text information from image and PDF files and obtains text data.

[0378] Step 3:

[0379] The server uses a generative AI model to analyze textual information and identify important information. The input is the textual information extracted in step 2, and the output is the identified important information. The server uses an NLP model (e.g., BERT, GPT-3) to analyze the document content and extract keywords and key data points.

[0380] Step 4:

[0381] The server generates question-and-answer pairs based on the identified key information. The input is the key information identified in step 3, and the output is the generated question-and-answer pairs. The server leverages the generative AI model to create a Q&A that includes answers to potential user inquiries.

[0382] Step 5:

[0383] The server adds the generated question-and-answer pairs to the information resource. The input is the question-and-answer pairs generated in step 4, and the output is the updated information resource. This enhances the knowledge base in real time.

[0384] Step 6:

[0385] The user makes a query, and the server uses an emotion engine to recognize the user's emotional state. The input is the user's voice or text input, and the output is data about the emotional state. The server uses analysis tools to evaluate the emotion and extract emotional information.

[0386] Step 7:

[0387] The server provides the optimal response based on the emotional state. The input consists of the emotional state recognized in step 6 and candidate responses from the information resources, while the output is the adjusted response presented to the user. The tone of the response is adjusted based on the emotional state to improve user satisfaction.

[0388] (Application Example 2)

[0389] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0390] In addition to extracting important information from documents and generating effective inquiry responses, there is a need to improve the user experience by providing appropriate responses that respond to the user's emotions. However, conventional systems often fail to adequately consider user emotions and can only provide fixed responses. Therefore, a system capable of flexible and personalized responses based on user emotions is necessary.

[0391] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0392] In this invention, the server includes means for uploading documents, means for extracting information, means for analyzing information to identify important data, means for generating queries and responses, means for adding the generated queries and responses to a knowledge base, means for providing the optimal response using the knowledge base, means for analyzing emotions, and means for selecting and providing the optimal response based on emotions. This makes it possible to provide the optimal answer according to the user's emotions.

[0393] "Methods for uploading documents" refer to functions that allow users to transfer text and image files to the system.

[0394] "Means for extracting information" refers to functions for extracting important text data from uploaded documents and images.

[0395] "Means of analyzing information to identify important data" refers to functions that process extracted information and identify highly relevant data.

[0396] "Means for generating inquiries and responses" refers to a function that creates questions and answers to user inquiries based on identified key data.

[0397] "Means of adding generated queries and responses to a knowledge base" refers to a function that saves the created question and answer pairs to a database for later use.

[0398] "A means of providing the optimal response using a knowledge base" refers to a function that refers to stored data and presents the most appropriate answer to a user's question.

[0399] "Means of analyzing emotions" refers to functions that evaluate emotions from the user's text or voice input.

[0400] "A means of selecting and providing the optimal response based on emotions" refers to a function that takes into account the user's emotional state and determines and provides a response with an appropriate tone and content.

[0401] This invention provides a system that enables a process in which a user uploads a document via a terminal, extracts important information from that document, and generates queries and responses within the system. The server identifies important data through document upload, information extraction, and information analysis, and adds the generated queries and responses to a knowledge base. When providing responses to user queries, the system is designed to analyze sentiment and select and provide the most appropriate response based on that sentiment.

[0402] The hardware used includes smartphones and personal computers, which users use to upload documents. The software utilizes Tesseract OCR for character recognition and Transformer-based models (such as BERT) for sentiment analysis. Data processing and computation involve the software extracting text information from image data, analyzing the extracted text to identify important information, and then using generative AI models to generate appropriate questions and answers.

[0403] As a concrete example, suppose a user takes a picture of an invoice using their smartphone and uploads it to a server. From this picture, the server uses character recognition technology to extract the invoice amount and payment deadline, and analyzes this information to generate inquiries and answers that the user might be asking. At the same time, sentiment analysis is performed based on the user's text or voice input, and the tone is adjusted according to the user's emotions. The prompt input used to generate the AI ​​model is, "Please give me advice about this expense. I am worried."

[0404] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0405] Step 1:

[0406] The user uploads a document or image to the server using their device. This input is an image of an invoice or receipt. The server receives this input and prepares for the next processing step.

[0407] Step 2:

[0408] The server extracts text information from the received image data. This process uses Tesseract OCR, a character recognition technology. It receives image data as input and generates text data as output.

[0409] Step 3:

[0410] The server analyzes the extracted text data to identify important information. This step uses an analysis algorithm to extract information important to the user (e.g., billing amount, payment deadline). The input is text data, and the output is a dataset containing the important information.

[0411] Step 4:

[0412] Based on identified key information, the server automatically generates queries and responses using a generative AI model. This process takes key information as input and uses the generative AI model to output appropriate question-and-answer pairs.

[0413] Step 5:

[0414] The user enters a prompt message into the server via text or voice through their terminal. This prompt message may include the user's emotions, for example, "Please give me advice about this expense. I am worried." The server receives this prompt message.

[0415] Step 6:

[0416] The server analyzes the user's emotions from the received prompt text. This process uses a Transformer-based emotion analysis model. The input is the prompt text, and the output is data indicating the user's emotional state.

[0417] Step 7:

[0418] The server selects and provides the optimal response to the user based on their emotions, referencing a knowledge base. This step uses emotion data and data from the knowledge base as input and outputs a response with an adjusted tone.

[0419] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0420] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0421] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0422] [Third Embodiment]

[0423] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0424] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0425] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0426] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0427] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0428] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0429] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0430] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0431] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0432] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0433] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0434] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0435] This invention begins with a user uploading a document file, including manuals and instructions, from their device to a server. The server then analyzes the received document file. If the uploaded file is in image format, the server uses optical character recognition technology to extract text information from the image and convert it into digital text.

[0436] Next, the server analyzes the extracted text information based on a generating AI model to identify important information within the document. This analysis predicts questions that users are likely to ask and automatically generates corresponding answers. The generated question-and-answer pairs are added to the knowledge base, making the newly added information available in real time.

[0437] When a user submits an inquiry, the question entered on the device is sent to the server. The server searches the knowledge base and identifies the relevant answer. The identified answer is then provided to the user through the device. This allows the user to receive quick and accurate support.

[0438] As a concrete example, consider a scenario where a user uploads a manual for a new home appliance. The server automatically extracts important information, including product setup instructions and troubleshooting guides, and generates answers to frequently asked questions, such as "What to do if the power won't turn on." When a user sends such a question to the chatbot, the server can quickly provide the appropriate solution from the knowledge base, assisting the user. This system makes it possible to simplify user operation while achieving efficient support.

[0439] The following describes the processing flow.

[0440] Step 1:

[0441] Users select manuals and document files via their terminal and upload them to the server using the system interface.

[0442] Step 2:

[0443] The terminal sends the selected document file to the server and notifies the user when the upload is complete.

[0444] Step 3:

[0445] The server receives the document file sent from the terminal and saves the file to temporary storage for analysis.

[0446] Step 4:

[0447] The server checks the file format, and if it's an image file, it uses optical character recognition technology to extract the text information within the image as digital text.

[0448] Step 5:

[0449] The server analyzes the extracted text using a generative AI model and performs natural language processing to identify important information.

[0450] Step 6:

[0451] The server predicts questions that the user is likely to ask frequently based on the text and generates appropriate answers to those questions.

[0452] Step 7:

[0453] The server adds the generated question-and-answer pairs to the knowledge base and keeps it up-to-date to prepare for real-time queries.

[0454] Step 8:

[0455] When a user requests information, the device sends the question to the server.

[0456] Step 9:

[0457] The server searches its knowledge base based on the question it receives and identifies the best answer.

[0458] Step 10:

[0459] The server sends the identified answer to the terminal, and the terminal displays it to the user, thereby providing the necessary information.

[0460] (Example 1)

[0461] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0462] It is essential to quickly and effectively utilize the information users possess and efficiently provide them with the necessary information. Traditional methods have made it difficult to quickly extract information from documents and provide appropriate answers to user inquiries in real time. It is necessary to solve these problems and improve user convenience and speed up information access.

[0463] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0464] In this invention, the server includes means for transferring a document to an information processing device via a communication device, means for determining whether the document is in image format and extracting character information from the image using optical character recognition technology, and means for analyzing the character information based on a generating AI model and identifying important information. This makes it possible to quickly provide the information that the user needs.

[0465] "Communication equipment" is a general term for devices used to transmit documents from a user's terminal to an information processing device.

[0466] An "information processing device" is a general term for a device that analyzes received documents and performs necessary data extraction and processing.

[0467] "Optical character recognition technology" is a general term for technologies that extract character information as digital text from image-based documents.

[0468] A "generative AI model" is a general term for artificial intelligence technologies that analyze given text information and extract specific patterns or information.

[0469] A "collection" is a general term for a database that stores generated question-and-answer pairs and uses them for subsequent searches and queries.

[0470] A "prompt message" is a general term for text containing instructions or questions that a user enters into a system.

[0471] This system primarily utilizes user terminals, servers, and communication devices. Users first upload document files to the server via the communication device using their terminal. The system supports document file formats such as PDF, JPEG, and PNG.

[0472] If the received document is in image format, the server extracts the text information using optical character recognition (OCR) technology. This process utilizes Tesseract OCR or other similar optical character recognition software. The extracted text information is then input into a generative AI model. Typical generative AI models include those incorporating natural language processing technology.

[0473] The generative AI model analyzes the input text information and extracts key information from the document. This analysis identifies questions that users are likely to ask and generates appropriate answers as countermeasures. These question-and-answer pairs are stored in a knowledge base on the server and managed in a constantly accessible state.

[0474] A concrete example is when a user uploads a manual for a new home appliance. In this case, the server can analyze the operating instructions and troubleshooting guide for the product described in the manual and automatically generate answers to frequently asked questions, such as "What to do if the power won't turn on."

[0475] Furthermore, when a user enters a prompt message from their terminal such as "Tell me how to set up a new home appliance," the server searches its knowledge base. As a result, relevant information is quickly identified and provided to the user through the terminal. This process allows users to efficiently obtain information and use it to solve problems. This system is particularly characterized by its intuitive operation and high level of convenience, making it beneficial to many users.

[0476] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0477] Step 1:

[0478] The user uses a terminal to select document files such as manuals and instruction sheets and uploads them to the server via a communication device. The input is the document file itself, in formats such as PDF or image files. The output is the document file stored on the server.

[0479] Step 2:

[0480] The server determines whether the received document file is in image format. If it is, it uses optical character recognition (OCR) technology to extract text information from the image. The input to this process is an image document file, and the output is text data. Specifically, software such as Tesseract OCR is used to convert the text into digital text.

[0481] Step 3:

[0482] The server inputs text data into a generating AI model, which then analyzes the important information. The input is extracted text, and the output is the analysis result containing the important information. This process utilizes natural language processing techniques to extract key points and questions that users are likely to ask within the document.

[0483] Step 4:

[0484] The server predicts questions that users are likely to frequently ask based on the analysis results and generates answers to them. At this stage, the input is the analysis results containing important information, and the output is question-and-answer pairs. The generated content is added to the knowledge base to prepare for future inquiries.

[0485] Step 5:

[0486] The user enters a prompt message from their terminal. A specific example might be a question like, "Please tell me how to set up a new home appliance." Based on this input, the server searches its knowledge base and identifies the relevant answer.

[0487] Step 6:

[0488] The server sends the search results to the terminal. The input consists of the user's prompt and the search results based on it, while the output is the answer displayed on the terminal. This allows the user to quickly and accurately obtain the information they are looking for.

[0489] (Application Example 1)

[0490] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0491] Home automated machinery and devices are required to provide users with real-time, efficient, and accurate information on how to use products and solutions to problems that may arise. However, existing systems struggle to quickly extract necessary information from documents and respond to user inquiries immediately. As a result, users often receive inefficient support when using products. A solution to this problem is needed.

[0492] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0493] In this invention, the server includes means for acquiring documents, means for extracting textual information from the documents, and means for analyzing the textual information to identify important information. This enables a home automated machine to provide users with useful and accurate product support information in real time.

[0494] "Means of obtaining documents" refers to the function of sending or uploading documents to a server from a user's device.

[0495] "Means for extracting textual information" refers to the process of extracting textual data from acquired documents.

[0496] "Means of analyzing textual information to identify important information" refers to the ability to analyze extracted textual data and select information that is useful to the user from it.

[0497] "Means for generating questions and answers" refers to a function that automatically generates appropriate answers to potential user questions based on identified key information.

[0498] "Means of adding to the knowledge base" refers to the process of registering the generated question-and-answer pairs in a database for storage.

[0499] "Means of converting into text information using optical technology" refers to technology that optically analyzes characters within an image and converts them into digital text.

[0500] "Means by which consumer automated machinery transmits information to users via voice or screen display" refers to functions such as those used by household robots to provide information to users using voice synthesis or display.

[0501] The system for carrying out this invention involves the collaborative operation of a home-use automated device and a server. The server retrieves documents sent by the user through a terminal and, if the document is in image format, extracts text data using optical character recognition (OCR) technology. The OCR software used in this process is widely known, such as Tesseract.

[0502] The server then inputs the extracted text data into a generating AI model (e.g., the GPT series) to identify important information. This analysis automatically generates questions that users are likely to ask and their corresponding answers. The generated question-and-answer pairs are added to the knowledge base on the server, enabling real-time inquiry handling.

[0503] A consumer-grade automated machine (e.g., a typical home robot) installed in a user's home receives information from a server and provides it to the user using speech synthesis and display functions. This information provision begins when the user asks the robot a question in natural language. The server analyzes the question, searches its knowledge base for the corresponding answer, and immediately responds to the user.

[0504] As a concrete example, consider a scenario where a user purchases a new home appliance and shows its instruction manual to a robot. When the user asks the robot, "Tell me how to use this product," the robot will provide specific instructions such as, "First, turn on the power. Next, press the settings button."

[0505] An example of a prompt message could be natural language input such as, "Please analyze the instruction manual for this new home appliance and provide general troubleshooting steps."

[0506] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0507] Step 1:

[0508] The user uploads a document from their device to the server. The input is the document file provided by the user, which the server receives. The output is the document file stored on the server.

[0509] Step 2:

[0510] The server analyzes uploaded documents and uses optical character recognition (OCR) technology if the document is in image format. The input is an image file, and the server extracts character information from the image data using OCR software such as Tesseract. The output is digital text data.

[0511] Step 3:

[0512] The server inputs the extracted text information into a generative AI model, which then analyzes the information. The input is digital text data, and language analysis is performed by a generative AI model (e.g., GPT series). This identifies important information within the document. The output is the identified important information.

[0513] Step 4:

[0514] The server automatically generates questions and corresponding answers that users are likely to ask, based on the identified key information. The input is the key information obtained in step 3, and the generation AI model is used to create question-and-answer pairs. The output is these pairs.

[0515] Step 5:

[0516] The server adds the generated question-and-answer pairs to the knowledge base. The input is the generated question-and-answer pairs, which the server registers in the database. The output is the updated knowledge base.

[0517] Step 6:

[0518] A user asks a home robot questions in natural language. The input is the user's voice or text questions, which the robot receives. The output is the received question data.

[0519] Step 7:

[0520] The server analyzes user question data and searches its knowledge base to identify answers. The input is the received question data, and the server uses a generative AI model for search and matching. The output is the identified answer.

[0521] Step 8:

[0522] The server sends the identified response to a home robot, which then provides the information to the user via speech synthesis or a display. The input is the response data provided by the server, and the output is the provision of information to the user in either audio or visual form.

[0523] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0524] This invention combines a system that uploads documents, extracts important information from those documents, and generates corresponding Q&A with an emotion engine that recognizes user emotions. Users upload document files to a server via their terminal. The server extracts textual information from the received document files and analyzes it using an AI model. The analysis identifies important information, and questions and answers are generated based on this information. These question-and-answer pairs are added to a knowledge base for real-time inquiry handling.

[0525] Furthermore, the present invention incorporates an emotion engine to recognize the emotional state of the user in response to their inquiries. The emotion engine analyzes the user's emotions from their voice or text input and takes this emotional information into account when matching it with the knowledge base on the server. As a result, the most appropriate response is selected according to the user's emotions and delivered to the user in an appropriate tone.

[0526] As a concrete example, consider a scenario where a user uploads a manual for setting up a complex product. The server automatically extracts key information, including the product setup procedure, and generates questions such as "What are the main causes when the product doesn't work correctly?" Furthermore, if the emotion engine detects that the user is expressing frustration or stress, the server provides answers with more polite language and additional explanations. This allows users to receive more satisfying support, improving the accuracy of problem solving and overall satisfaction.

[0527] The following describes the processing flow.

[0528] Step 1:

[0529] Users select manuals and document files using a terminal and upload them to the server via the system interface.

[0530] Step 2:

[0531] The terminal sends the selected document file to the server and displays a notification to the user confirming that the upload is complete.

[0532] Step 3:

[0533] The server receives documents sent from the terminal and temporarily stores them for analysis of their contents.

[0534] Step 4:

[0535] The server checks the format of the uploaded file, and if it is an image file, it uses optical character recognition technology to extract the character information and converts it into a text file.

[0536] Step 5:

[0537] The server activates a generative AI model and identifies important information by analyzing the extracted textual data.

[0538] Step 6:

[0539] Based on identified key information, the server predicts questions that users are likely to ask and generates answers to them.

[0540] Step 7:

[0541] The generated questions and answers are structured and added to the knowledge base, preparing it for real-time user inquiries.

[0542] Step 8:

[0543] When a user makes an inquiry, the device sends the question to the server as text or voice.

[0544] Step 9:

[0545] The emotion engine on the server analyzes the user's emotions from their text or voice and extracts emotional information.

[0546] Step 10:

[0547] The server searches the knowledge base based on sentiment information and selects the most appropriate answer in the right tone.

[0548] Step 11:

[0549] The server sends the selected answer to the terminal, which then displays it to the user and provides appropriate support.

[0550] (Example 2)

[0551] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0552] Currently, a vast amount of information is stored electronically as massive documents, making it difficult to instantly extract necessary information from these and provide it to users in an appropriate format. Furthermore, understanding and responding appropriately to user emotions during inquiries is required, but achieving this is also challenging. It is necessary to resolve these problems and realize information provision and emotionally sensitive responses to users.

[0553] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0554] In this invention, the server includes means for uploading a document, means for extracting textual information from the document, means for analyzing the textual information to identify important information, means for adding generated questions and answers to an information resource, means for recognizing the emotional state of the user input, and means for providing the optimal response according to the emotional state. This makes it possible to immediately extract and provide necessary information from a document and to respond in a way that takes the user's emotions into consideration.

[0555] A "document" refers to a medium on which information is recorded using characters or symbols, and includes both electronic and physical forms.

[0556] "The means of uploading" refers to the process by which a user transfers a document to an electronic system and saves it to a server.

[0557] "Textual information" refers to data represented by characters, numbers, and symbols contained in a document.

[0558] "Important information" refers to key data and knowledge extracted through document analysis that are likely to be of interest to users.

[0559] "Questions and Answers" refers to a question format and its response that is generated based on important information and structured to help the user understand.

[0560] "Information resources" refer to databases where generated question-and-answer pairs are stored and managed.

[0561] "Means of providing answers to inquiries" refers to the process of using information resources to return appropriate information in response to questions from users.

[0562] "Means of recognizing emotional states" refers to technologies that analyze user input, such as voice or text, to identify emotions.

[0563] "Means of providing the optimal answer" refers to the process of selecting and presenting the most appropriate information from information resources in response to the recognized emotional state of the user.

[0564] This invention begins with a user uploading a document to a server using a terminal. The terminal can access a dedicated application via a web browser and send a document file to the server using the "Select File" button. The server receives the document file and extracts the character information using optical character recognition technology (OCR software example: Tesseract).

[0565] The extracted textual information is analyzed by a generative AI model within the server, specifically a natural language processing (NLP) model (e.g., BERT, GPT-3). This analysis identifies key information from the document. Based on this key information, the server generates questions and answers and adds these pairs to the information resource in real time. This process strengthens the question-and-answer knowledge base.

[0566] When a user makes an inquiry, the server uses an emotion engine to recognize the user's emotions from their input (voice or text). This emotion analysis uses natural language processing tools (e.g., IBM Watson NLU). Taking the emotion information into account, the server selects the most appropriate answer from its information resources and responds in a tone that matches the user's emotional state.

[0567] As a concrete example, consider a scenario where a user uploads an instruction manual for a complex piece of equipment. The server extracts key procedures and troubleshooting information from the manual and generates questions such as, "What causes the equipment to not work?" Furthermore, if the sentiment engine detects that the user is dissatisfied, the server provides a more detailed and considerate response.

[0568] Examples of prompts include, "What important information is included in this manual? Generate a Q&A based on it," and "How should you respond if a user expresses dissatisfaction? Use the sentiment engine to provide an appropriate response."

[0569] In this way, users can receive timely and accurate information from the server, along with emotionally sensitive responses, thereby improving their content usage experience.

[0570] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0571] Step 1:

[0572] The user uploads a document file to the server using their device. The input is the document file selected by the user, and the output is the document file saved on the server. The user opens a web browser, uses the upload form, clicks the "Select File" button, selects the desired file, and submits it.

[0573] Step 2:

[0574] The server extracts text information from the uploaded document file. The input is the document file received in step 1, and the output is the extracted text information. The server uses OCR software to recognize text information from image and PDF files and obtains text data.

[0575] Step 3:

[0576] The server uses a generative AI model to analyze textual information and identify important information. The input is the textual information extracted in step 2, and the output is the identified important information. The server uses an NLP model (e.g., BERT, GPT-3) to analyze the document content and extract keywords and key data points.

[0577] Step 4:

[0578] The server generates question-and-answer pairs based on the identified key information. The input is the key information identified in step 3, and the output is the generated question-and-answer pairs. The server leverages the generative AI model to create a Q&A that includes answers to potential user inquiries.

[0579] Step 5:

[0580] The server adds the generated question-and-answer pairs to the information resource. The input is the question-and-answer pairs generated in step 4, and the output is the updated information resource. This enhances the knowledge base in real time.

[0581] Step 6:

[0582] The user makes a query, and the server uses an emotion engine to recognize the user's emotional state. The input is the user's voice or text input, and the output is data about the emotional state. The server uses analysis tools to evaluate the emotion and extract emotional information.

[0583] Step 7:

[0584] The server provides the optimal response based on the emotional state. The input consists of the emotional state recognized in step 6 and candidate responses from the information resources, while the output is the adjusted response presented to the user. The tone of the response is adjusted based on the emotional state to improve user satisfaction.

[0585] (Application Example 2)

[0586] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0587] In addition to extracting important information from documents and generating effective inquiry responses, there is a need to improve the user experience by providing appropriate responses that respond to the user's emotions. However, conventional systems often fail to adequately consider user emotions and can only provide fixed responses. Therefore, a system capable of flexible and personalized responses based on user emotions is necessary.

[0588] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0589] In this invention, the server includes means for uploading documents, means for extracting information, means for analyzing information to identify important data, means for generating queries and responses, means for adding the generated queries and responses to a knowledge base, means for providing the optimal response using the knowledge base, means for analyzing emotions, and means for selecting and providing the optimal response based on emotions. This makes it possible to provide the optimal answer according to the user's emotions.

[0590] "Methods for uploading documents" refer to functions that allow users to transfer text and image files to the system.

[0591] "Means for extracting information" refers to functions for extracting important text data from uploaded documents and images.

[0592] "Means of analyzing information to identify important data" refers to functions that process extracted information and identify highly relevant data.

[0593] "Means for generating inquiries and responses" refers to a function that creates questions and answers to user inquiries based on identified key data.

[0594] "Means of adding generated queries and responses to a knowledge base" refers to a function that saves the created question and answer pairs to a database for later use.

[0595] "A means of providing the optimal response using a knowledge base" refers to a function that refers to stored data and presents the most appropriate answer to a user's question.

[0596] "Means of analyzing emotions" refers to functions that evaluate emotions from the user's text or voice input.

[0597] "A means of selecting and providing the optimal response based on emotions" refers to a function that takes into account the user's emotional state and determines and provides a response with an appropriate tone and content.

[0598] This invention provides a system that enables a process in which a user uploads a document via a terminal, extracts important information from that document, and generates queries and responses within the system. The server identifies important data through document upload, information extraction, and information analysis, and adds the generated queries and responses to a knowledge base. When providing responses to user queries, the system is designed to analyze sentiment and select and provide the most appropriate response based on that sentiment.

[0599] The hardware used includes smartphones and personal computers, which users use to upload documents. The software utilizes Tesseract OCR for character recognition and Transformer-based models (such as BERT) for sentiment analysis. Data processing and computation involve the software extracting text information from image data, analyzing the extracted text to identify important information, and then using generative AI models to generate appropriate questions and answers.

[0600] As a concrete example, suppose a user takes a picture of an invoice using their smartphone and uploads it to a server. From this picture, the server uses character recognition technology to extract the invoice amount and payment deadline, and analyzes this information to generate inquiries and answers that the user might be asking. At the same time, sentiment analysis is performed based on the user's text or voice input, and the tone is adjusted according to the user's emotions. The prompt input used to generate the AI ​​model is, "Please give me advice about this expense. I am worried."

[0601] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0602] Step 1:

[0603] The user uploads a document or image to the server using their device. This input is an image of an invoice or receipt. The server receives this input and prepares for the next processing step.

[0604] Step 2:

[0605] The server extracts text information from the received image data. This process uses Tesseract OCR, a character recognition technology. It receives image data as input and generates text data as output.

[0606] Step 3:

[0607] The server analyzes the extracted text data to identify important information. This step uses an analysis algorithm to extract information important to the user (e.g., billing amount, payment deadline). The input is text data, and the output is a dataset containing the important information.

[0608] Step 4:

[0609] Based on identified key information, the server automatically generates queries and responses using a generative AI model. This process takes key information as input and uses the generative AI model to output appropriate question-and-answer pairs.

[0610] Step 5:

[0611] The user enters a prompt message into the server via text or voice through their terminal. This prompt message may include the user's emotions, for example, "Please give me advice about this expense. I am worried." The server receives this prompt message.

[0612] Step 6:

[0613] The server analyzes the user's emotions from the received prompt text. This process uses a Transformer-based emotion analysis model. The input is the prompt text, and the output is data indicating the user's emotional state.

[0614] Step 7:

[0615] The server selects and provides the optimal response to the user based on their emotions, referencing a knowledge base. This step uses emotion data and data from the knowledge base as input and outputs a response with an adjusted tone.

[0616] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0617] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0618] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0619] [Fourth Embodiment]

[0620] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0621] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0622] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0623] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0624] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0625] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0626] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0627] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0628] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0629] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0630] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0631] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0632] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0633] This invention begins with a user uploading a document file, including manuals and instructions, from their device to a server. The server then analyzes the received document file. If the uploaded file is in image format, the server uses optical character recognition technology to extract text information from the image and convert it into digital text.

[0634] Next, the server analyzes the extracted text information based on a generating AI model to identify important information within the document. This analysis predicts questions that users are likely to ask and automatically generates corresponding answers. The generated question-and-answer pairs are added to the knowledge base, making the newly added information available in real time.

[0635] When a user submits an inquiry, the question entered on the device is sent to the server. The server searches the knowledge base and identifies the relevant answer. The identified answer is then provided to the user through the device. This allows the user to receive quick and accurate support.

[0636] As a concrete example, consider a scenario where a user uploads a manual for a new home appliance. The server automatically extracts important information, including product setup instructions and troubleshooting guides, and generates answers to frequently asked questions, such as "What to do if the power won't turn on." When a user sends such a question to the chatbot, the server can quickly provide the appropriate solution from the knowledge base, assisting the user. This system makes it possible to simplify user operation while achieving efficient support.

[0637] The following describes the processing flow.

[0638] Step 1:

[0639] Users select manuals and document files via their terminal and upload them to the server using the system interface.

[0640] Step 2:

[0641] The terminal sends the selected document file to the server and notifies the user when the upload is complete.

[0642] Step 3:

[0643] The server receives the document file sent from the terminal and saves the file to temporary storage for analysis.

[0644] Step 4:

[0645] The server checks the file format, and if it's an image file, it uses optical character recognition technology to extract the text information within the image as digital text.

[0646] Step 5:

[0647] The server analyzes the extracted text using a generative AI model and performs natural language processing to identify important information.

[0648] Step 6:

[0649] The server predicts questions that the user is likely to ask frequently based on the text and generates appropriate answers to those questions.

[0650] Step 7:

[0651] The server adds the generated question-and-answer pairs to the knowledge base and keeps it up-to-date to prepare for real-time queries.

[0652] Step 8:

[0653] When a user requests information, the device sends the question to the server.

[0654] Step 9:

[0655] The server searches its knowledge base based on the question it receives and identifies the best answer.

[0656] Step 10:

[0657] The server sends the identified answer to the terminal, and the terminal displays it to the user, thereby providing the necessary information.

[0658] (Example 1)

[0659] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0660] It is essential to quickly and effectively utilize the information users possess and efficiently provide them with the necessary information. Traditional methods have made it difficult to quickly extract information from documents and provide appropriate answers to user inquiries in real time. It is necessary to solve these problems and improve user convenience and speed up information access.

[0661] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0662] In this invention, the server includes means for transferring a document to an information processing device via a communication device, means for determining whether the document is in image format and extracting character information from the image using optical character recognition technology, and means for analyzing the character information based on a generating AI model and identifying important information. This makes it possible to quickly provide the information that the user needs.

[0663] "Communication equipment" is a general term for devices used to transmit documents from a user's terminal to an information processing device.

[0664] An "information processing device" is a general term for a device that analyzes received documents and performs necessary data extraction and processing.

[0665] "Optical character recognition technology" is a general term for technologies that extract character information as digital text from image-based documents.

[0666] A "generative AI model" is a general term for artificial intelligence technologies that analyze given text information and extract specific patterns or information.

[0667] A "collection" is a general term for a database that stores generated question-and-answer pairs and uses them for subsequent searches and queries.

[0668] A "prompt message" is a general term for text containing instructions or questions that a user enters into a system.

[0669] This system primarily utilizes user terminals, servers, and communication devices. Users first upload document files to the server via the communication device using their terminal. The system supports document file formats such as PDF, JPEG, and PNG.

[0670] If the received document is in image format, the server extracts the text information using optical character recognition (OCR) technology. This process utilizes Tesseract OCR or other similar optical character recognition software. The extracted text information is then input into a generative AI model. Typical generative AI models include those incorporating natural language processing technology.

[0671] The generative AI model analyzes the input text information and extracts key information from the document. This analysis identifies questions that users are likely to ask and generates appropriate answers as countermeasures. These question-and-answer pairs are stored in a knowledge base on the server and managed in a constantly accessible state.

[0672] A concrete example is when a user uploads a manual for a new home appliance. In this case, the server can analyze the operating instructions and troubleshooting guide for the product described in the manual and automatically generate answers to frequently asked questions, such as "What to do if the power won't turn on."

[0673] Furthermore, when a user enters a prompt message from their terminal such as "Tell me how to set up a new home appliance," the server searches its knowledge base. As a result, relevant information is quickly identified and provided to the user through the terminal. This process allows users to efficiently obtain information and use it to solve problems. This system is particularly characterized by its intuitive operation and high level of convenience, making it beneficial to many users.

[0674] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0675] Step 1:

[0676] The user uses a terminal to select document files such as manuals and instruction sheets and uploads them to the server via a communication device. The input is the document file itself, in formats such as PDF or image files. The output is the document file stored on the server.

[0677] Step 2:

[0678] The server determines whether the received document file is in image format. If it is, it uses optical character recognition (OCR) technology to extract text information from the image. The input to this process is an image document file, and the output is text data. Specifically, software such as Tesseract OCR is used to convert the text into digital text.

[0679] Step 3:

[0680] The server inputs text data into a generating AI model, which then analyzes the important information. The input is extracted text, and the output is the analysis result containing the important information. This process utilizes natural language processing techniques to extract key points and questions that users are likely to ask within the document.

[0681] Step 4:

[0682] The server predicts questions that users are likely to frequently ask based on the analysis results and generates answers to them. At this stage, the input is the analysis results containing important information, and the output is question-and-answer pairs. The generated content is added to the knowledge base to prepare for future inquiries.

[0683] Step 5:

[0684] The user enters a prompt message from their terminal. A specific example might be a question like, "Please tell me how to set up a new home appliance." Based on this input, the server searches its knowledge base and identifies the relevant answer.

[0685] Step 6:

[0686] The server sends the search results to the terminal. The input consists of the user's prompt and the search results based on it, while the output is the answer displayed on the terminal. This allows the user to quickly and accurately obtain the information they are looking for.

[0687] (Application Example 1)

[0688] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0689] Home automated machinery and devices are required to provide users with real-time, efficient, and accurate information on how to use products and solutions to problems that may arise. However, existing systems struggle to quickly extract necessary information from documents and respond to user inquiries immediately. As a result, users often receive inefficient support when using products. A solution to this problem is needed.

[0690] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0691] In this invention, the server includes means for acquiring documents, means for extracting textual information from the documents, and means for analyzing the textual information to identify important information. This enables a home automated machine to provide users with useful and accurate product support information in real time.

[0692] "Means of obtaining documents" refers to the function of sending or uploading documents to a server from a user's device.

[0693] "Means for extracting textual information" refers to the process of extracting textual data from acquired documents.

[0694] "Means of analyzing textual information to identify important information" refers to the ability to analyze extracted textual data and select information that is useful to the user from it.

[0695] "Means for generating questions and answers" refers to a function that automatically generates appropriate answers to potential user questions based on identified key information.

[0696] "Means of adding to the knowledge base" refers to the process of registering the generated question-and-answer pairs in a database for storage.

[0697] "Means of converting into text information using optical technology" refers to technology that optically analyzes characters within an image and converts them into digital text.

[0698] "Means by which consumer automated machinery transmits information to users via voice or screen display" refers to functions such as those used by household robots to provide information to users using voice synthesis or display.

[0699] The system for carrying out this invention involves the collaborative operation of a home-use automated device and a server. The server retrieves documents sent by the user through a terminal and, if the document is in image format, extracts text data using optical character recognition (OCR) technology. The OCR software used in this process is widely known, such as Tesseract.

[0700] The server then inputs the extracted text data into a generating AI model (e.g., the GPT series) to identify important information. This analysis automatically generates questions that users are likely to ask and their corresponding answers. The generated question-and-answer pairs are added to the knowledge base on the server, enabling real-time inquiry handling.

[0701] A consumer-grade automated machine (e.g., a typical home robot) installed in a user's home receives information from a server and provides it to the user using speech synthesis and display functions. This information provision begins when the user asks the robot a question in natural language. The server analyzes the question, searches its knowledge base for the corresponding answer, and immediately responds to the user.

[0702] As a concrete example, consider a scenario where a user purchases a new home appliance and shows its instruction manual to a robot. When the user asks the robot, "Tell me how to use this product," the robot will provide specific instructions such as, "First, turn on the power. Next, press the settings button."

[0703] An example of a prompt message could be natural language input such as, "Please analyze the instruction manual for this new home appliance and provide general troubleshooting steps."

[0704] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0705] Step 1:

[0706] The user uploads a document from their device to the server. The input is the document file provided by the user, which the server receives. The output is the document file stored on the server.

[0707] Step 2:

[0708] The server analyzes uploaded documents and uses optical character recognition (OCR) technology if the document is in image format. The input is an image file, and the server extracts character information from the image data using OCR software such as Tesseract. The output is digital text data.

[0709] Step 3:

[0710] The server inputs the extracted text information into a generative AI model, which then analyzes the information. The input is digital text data, and language analysis is performed by a generative AI model (e.g., GPT series). This identifies important information within the document. The output is the identified important information.

[0711] Step 4:

[0712] The server automatically generates questions and corresponding answers that users are likely to ask, based on the identified key information. The input is the key information obtained in step 3, and the generation AI model is used to create question-and-answer pairs. The output is these pairs.

[0713] Step 5:

[0714] The server adds the generated question-and-answer pairs to the knowledge base. The input is the generated question-and-answer pairs, which the server registers in the database. The output is the updated knowledge base.

[0715] Step 6:

[0716] A user asks a home robot questions in natural language. The input is the user's voice or text questions, which the robot receives. The output is the received question data.

[0717] Step 7:

[0718] The server analyzes user question data and searches its knowledge base to identify answers. The input is the received question data, and the server uses a generative AI model for search and matching. The output is the identified answer.

[0719] Step 8:

[0720] The server sends the identified response to a home robot, which then provides the information to the user via speech synthesis or a display. The input is the response data provided by the server, and the output is the provision of information to the user in either audio or visual form.

[0721] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0722] This invention combines a system that uploads documents, extracts important information from those documents, and generates corresponding Q&A with an emotion engine that recognizes user emotions. Users upload document files to a server via their terminal. The server extracts textual information from the received document files and analyzes it using an AI model. The analysis identifies important information, and questions and answers are generated based on this information. These question-and-answer pairs are added to a knowledge base for real-time inquiry handling.

[0723] Furthermore, the present invention incorporates an emotion engine to recognize the emotional state of the user in response to their inquiries. The emotion engine analyzes the user's emotions from their voice or text input and takes this emotional information into account when matching it with the knowledge base on the server. As a result, the most appropriate response is selected according to the user's emotions and delivered to the user in an appropriate tone.

[0724] As a concrete example, consider a scenario where a user uploads a manual for setting up a complex product. The server automatically extracts key information, including the product setup procedure, and generates questions such as "What are the main causes when the product doesn't work correctly?" Furthermore, if the emotion engine detects that the user is expressing frustration or stress, the server provides answers with more polite language and additional explanations. This allows users to receive more satisfying support, improving the accuracy of problem solving and overall satisfaction.

[0725] The following describes the processing flow.

[0726] Step 1:

[0727] Users select manuals and document files using a terminal and upload them to the server via the system interface.

[0728] Step 2:

[0729] The terminal sends the selected document file to the server and displays a notification to the user confirming that the upload is complete.

[0730] Step 3:

[0731] The server receives documents sent from the terminal and temporarily stores them for analysis of their contents.

[0732] Step 4:

[0733] The server checks the format of the uploaded file, and if it is an image file, it uses optical character recognition technology to extract the character information and converts it into a text file.

[0734] Step 5:

[0735] The server activates a generative AI model and identifies important information by analyzing the extracted textual data.

[0736] Step 6:

[0737] Based on identified key information, the server predicts questions that users are likely to ask and generates answers to them.

[0738] Step 7:

[0739] The generated questions and answers are structured and added to the knowledge base, preparing it for real-time user inquiries.

[0740] Step 8:

[0741] When a user makes an inquiry, the device sends the question to the server as text or voice.

[0742] Step 9:

[0743] The emotion engine on the server analyzes the user's emotions from their text or voice and extracts emotional information.

[0744] Step 10:

[0745] The server searches the knowledge base based on sentiment information and selects the most appropriate answer in the right tone.

[0746] Step 11:

[0747] The server sends the selected answer to the terminal, which then displays it to the user and provides appropriate support.

[0748] (Example 2)

[0749] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0750] Currently, a vast amount of information is stored electronically as massive documents, making it difficult to instantly extract necessary information from these and provide it to users in an appropriate format. Furthermore, understanding and responding appropriately to user emotions during inquiries is required, but achieving this is also challenging. It is necessary to resolve these problems and realize information provision and emotionally sensitive responses to users.

[0751] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0752] In this invention, the server includes means for uploading a document, means for extracting textual information from the document, means for analyzing the textual information to identify important information, means for adding generated questions and answers to an information resource, means for recognizing the emotional state of the user input, and means for providing the optimal response according to the emotional state. This makes it possible to immediately extract and provide necessary information from a document and to respond in a way that takes the user's emotions into consideration.

[0753] A "document" refers to a medium on which information is recorded using characters or symbols, and includes both electronic and physical forms.

[0754] "The means of uploading" refers to the process by which a user transfers a document to an electronic system and saves it to a server.

[0755] "Textual information" refers to data represented by characters, numbers, and symbols contained in a document.

[0756] "Important information" refers to key data and knowledge extracted through document analysis that are likely to be of interest to users.

[0757] "Questions and Answers" refers to a question format and its response that is generated based on important information and structured to help the user understand.

[0758] "Information resources" refer to databases where generated question-and-answer pairs are stored and managed.

[0759] "Means of providing answers to inquiries" refers to the process of using information resources to return appropriate information in response to questions from users.

[0760] "Means of recognizing emotional states" refers to technologies that analyze user input, such as voice or text, to identify emotions.

[0761] "Means of providing the optimal answer" refers to the process of selecting and presenting the most appropriate information from information resources in response to the recognized emotional state of the user.

[0762] This invention begins with a user uploading a document to a server using a terminal. The terminal can access a dedicated application via a web browser and send a document file to the server using the "Select File" button. The server receives the document file and extracts the character information using optical character recognition technology (OCR software example: Tesseract).

[0763] The extracted textual information is analyzed by a generative AI model within the server, specifically a natural language processing (NLP) model (e.g., BERT, GPT-3). This analysis identifies key information from the document. Based on this key information, the server generates questions and answers and adds these pairs to the information resource in real time. This process strengthens the question-and-answer knowledge base.

[0764] When a user makes an inquiry, the server uses an emotion engine to recognize the user's emotions from their input (voice or text). This emotion analysis uses natural language processing tools (e.g., IBM Watson NLU). Taking the emotion information into account, the server selects the most appropriate answer from its information resources and responds in a tone that matches the user's emotional state.

[0765] As a concrete example, consider a scenario where a user uploads an instruction manual for a complex piece of equipment. The server extracts key procedures and troubleshooting information from the manual and generates questions such as, "What causes the equipment to not work?" Furthermore, if the sentiment engine detects that the user is dissatisfied, the server provides a more detailed and considerate response.

[0766] Examples of prompts include, "What important information is included in this manual? Generate a Q&A based on it," and "How should you respond if a user expresses dissatisfaction? Use the sentiment engine to provide an appropriate response."

[0767] In this way, users can receive timely and accurate information from the server, along with emotionally sensitive responses, thereby improving their content usage experience.

[0768] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0769] Step 1:

[0770] The user uploads a document file to the server using their device. The input is the document file selected by the user, and the output is the document file saved on the server. The user opens a web browser, uses the upload form, clicks the "Select File" button, selects the desired file, and submits it.

[0771] Step 2:

[0772] The server extracts text information from the uploaded document file. The input is the document file received in step 1, and the output is the extracted text information. The server uses OCR software to recognize text information from image and PDF files and obtains text data.

[0773] Step 3:

[0774] The server uses a generative AI model to analyze textual information and identify important information. The input is the textual information extracted in step 2, and the output is the identified important information. The server uses an NLP model (e.g., BERT, GPT-3) to analyze the document content and extract keywords and key data points.

[0775] Step 4:

[0776] The server generates question-and-answer pairs based on the identified key information. The input is the key information identified in step 3, and the output is the generated question-and-answer pairs. The server leverages the generative AI model to create a Q&A that includes answers to potential user inquiries.

[0777] Step 5:

[0778] The server adds the generated question-and-answer pairs to the information resource. The input is the question-and-answer pairs generated in step 4, and the output is the updated information resource. This enhances the knowledge base in real time.

[0779] Step 6:

[0780] The user makes a query, and the server uses an emotion engine to recognize the user's emotional state. The input is the user's voice or text input, and the output is data about the emotional state. The server uses analysis tools to evaluate the emotion and extract emotional information.

[0781] Step 7:

[0782] The server provides the optimal response based on the emotional state. The input consists of the emotional state recognized in step 6 and candidate responses from the information resources, while the output is the adjusted response presented to the user. The tone of the response is adjusted based on the emotional state to improve user satisfaction.

[0783] (Application Example 2)

[0784] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0785] In addition to extracting important information from documents and generating effective inquiry responses, there is a need to improve the user experience by providing appropriate responses that respond to the user's emotions. However, conventional systems often fail to adequately consider user emotions and can only provide fixed responses. Therefore, a system capable of flexible and personalized responses based on user emotions is necessary.

[0786] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0787] In this invention, the server includes means for uploading documents, means for extracting information, means for analyzing information to identify important data, means for generating queries and responses, means for adding the generated queries and responses to a knowledge base, means for providing the optimal response using the knowledge base, means for analyzing emotions, and means for selecting and providing the optimal response based on emotions. This makes it possible to provide the optimal answer according to the user's emotions.

[0788] "Methods for uploading documents" refer to functions that allow users to transfer text and image files to the system.

[0789] "Means for extracting information" refers to functions for extracting important text data from uploaded documents and images.

[0790] "Means of analyzing information to identify important data" refers to functions that process extracted information and identify highly relevant data.

[0791] "Means for generating inquiries and responses" refers to a function that creates questions and answers to user inquiries based on identified key data.

[0792] "Means of adding generated queries and responses to a knowledge base" refers to a function that saves the created question and answer pairs to a database for later use.

[0793] "A means of providing the optimal response using a knowledge base" refers to a function that refers to stored data and presents the most appropriate answer to a user's question.

[0794] "Means of analyzing emotions" refers to functions that evaluate emotions from the user's text or voice input.

[0795] "A means of selecting and providing the optimal response based on emotions" refers to a function that takes into account the user's emotional state and determines and provides a response with an appropriate tone and content.

[0796] This invention provides a system that enables a process in which a user uploads a document via a terminal, extracts important information from that document, and generates queries and responses within the system. The server identifies important data through document upload, information extraction, and information analysis, and adds the generated queries and responses to a knowledge base. When providing responses to user queries, the system is designed to analyze sentiment and select and provide the most appropriate response based on that sentiment.

[0797] The hardware used includes smartphones and personal computers, which users use to upload documents. The software utilizes Tesseract OCR for character recognition and Transformer-based models (such as BERT) for sentiment analysis. Data processing and computation involve the software extracting text information from image data, analyzing the extracted text to identify important information, and then using generative AI models to generate appropriate questions and answers.

[0798] As a concrete example, suppose a user takes a picture of an invoice using their smartphone and uploads it to a server. From this picture, the server uses character recognition technology to extract the invoice amount and payment deadline, and analyzes this information to generate inquiries and answers that the user might be asking. At the same time, sentiment analysis is performed based on the user's text or voice input, and the tone is adjusted according to the user's emotions. The prompt input used to generate the AI ​​model is, "Please give me advice about this expense. I am worried."

[0799] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0800] Step 1:

[0801] The user uploads a document or image to the server using their device. This input is an image of an invoice or receipt. The server receives this input and prepares for the next processing step.

[0802] Step 2:

[0803] The server extracts text information from the received image data. This process uses Tesseract OCR, a character recognition technology. It receives image data as input and generates text data as output.

[0804] Step 3:

[0805] The server analyzes the extracted text data to identify important information. This step uses an analysis algorithm to extract information important to the user (e.g., billing amount, payment deadline). The input is text data, and the output is a dataset containing the important information.

[0806] Step 4:

[0807] Based on identified key information, the server automatically generates queries and responses using a generative AI model. This process takes key information as input and uses the generative AI model to output appropriate question-and-answer pairs.

[0808] Step 5:

[0809] The user enters a prompt message into the server via text or voice through their terminal. This prompt message may include the user's emotions, for example, "Please give me advice about this expense. I am worried." The server receives this prompt message.

[0810] Step 6:

[0811] The server analyzes the user's emotions from the received prompt text. This process uses a Transformer-based emotion analysis model. The input is the prompt text, and the output is data indicating the user's emotional state.

[0812] Step 7:

[0813] The server selects and provides the optimal response to the user based on their emotions, referencing a knowledge base. This step uses emotion data and data from the knowledge base as input and outputs a response with an adjusted tone.

[0814] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0815] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0816] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0817] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0818] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0819] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0820] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0821] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0822] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0823] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0824] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0825] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0826] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0827] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0828] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0829] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0830] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0831] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0832] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0833] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0834] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0835] The following is further disclosed regarding the embodiments described above.

[0836] (Claim 1)

[0837] Methods for uploading documents,

[0838] A means for extracting textual information from the document,

[0839] A means for analyzing the textual information to identify important information,

[0840] A means for generating questions and answers based on the important information,

[0841] A means of adding the generated questions and answers to the knowledge base,

[0842] A means of providing answers to inquiries using the knowledge base,

[0843] A system that includes this.

[0844] (Claim 2)

[0845] The system according to claim 1, which extracts textual information in multiple languages ​​and generates questions and answers.

[0846] (Claim 3)

[0847] The system according to claim 1, which extracts character information from an image using optical character recognition technology.

[0848] "Example 1"

[0849] (Claim 1)

[0850] A means for transferring a document to an information processing device via a communication device,

[0851] A means for determining whether the document is in image format and for extracting character information from the image using optical character recognition technology,

[0852] A means for analyzing the text information based on a generation AI model and identifying important information,

[0853] A means for generating questions that users are likely to frequently ask and their answers based on the relevant important information,

[0854] A means to add the generated questions and answers to a collection and make the information available in real time,

[0855] A means for providing appropriate answers to inquiries from terminals using the said aggregate,

[0856] A system that includes this.

[0857] (Claim 2)

[0858] The system according to claim 1, which extracts textual information in multiple languages ​​and generates questions and answers based on user prompts.

[0859] (Claim 3)

[0860] The system according to claim 1, which extracts character information from an image using optical character recognition technology and performs analysis using a generative AI model.

[0861] "Application Example 1"

[0862] (Claim 1)

[0863] Means of obtaining documents,

[0864] A means for extracting textual information from the document,

[0865] A means for analyzing the textual information to identify important information,

[0866] Means for creating questions and answers based on the important information,

[0867] A means of adding the generated questions and answers to the knowledge base,

[0868] A means of providing responses to inquiries using the knowledge base,

[0869] A means of converting acquired images into text information using optical technology,

[0870] Means by which consumer automated machinery transmits information to the user via voice or screen display,

[0871] A system that includes this.

[0872] (Claim 2)

[0873] The system according to claim 1, which extracts textual information in multiple languages ​​and generates questions and answers.

[0874] (Claim 3)

[0875] The system according to claim 1, which analyzes a user's instructions in natural language and provides an appropriate response based on those instructions.

[0876] "Example 2 of combining an emotion engine"

[0877] (Claim 1)

[0878] Methods for uploading documents,

[0879] A means for extracting textual information from the document,

[0880] A means for analyzing the textual information to identify important information,

[0881] A means for generating questions and answers based on the important information,

[0882] A means of adding the generated questions and answers to the information resource,

[0883] A means of providing answers to inquiries using the said information resources,

[0884] A means of recognizing the emotional state of user input,

[0885] A means of providing the optimal response according to the emotional state,

[0886] A system that includes this.

[0887] (Claim 2)

[0888] The system according to claim 1, which extracts textual information in multiple languages ​​and generates questions and answers.

[0889] (Claim 3)

[0890] The system according to claim 1, which extracts character information from an image using optical character recognition technology.

[0891] "Application example 2 when combining with an emotional engine"

[0892] (Claim 1)

[0893] Methods for uploading documents,

[0894] A means for extracting information from the document,

[0895] A means for analyzing the information to identify important data,

[0896] Means for generating queries and responses based on the important data,

[0897] A means of adding the generated queries and responses to the knowledge base,

[0898] A means for providing a response using the knowledge base,

[0899] Methods for analyzing emotions,

[0900] A means of selecting and providing the optimal response based on the emotion,

[0901] A system that includes this.

[0902] (Claim 2)

[0903] The system according to claim 1, which extracts information in multiple languages ​​and generates inquiries and responses.

[0904] (Claim 3)

[0905] The system according to claim 1, which extracts information from an image using character recognition technology. [Explanation of symbols]

[0906] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means of obtaining documents, A means for extracting textual information from the document, A means for analyzing the textual information to identify important information, Means for creating questions and answers based on the important information, A means of adding the generated questions and answers to the knowledge base, A means of providing responses to inquiries using the knowledge base, A means of converting acquired images into text information using optical technology, Means by which consumer automated machinery transmits information to the user via voice or screen display, A system that includes this.

2. The system according to claim 1, which extracts textual information in multiple languages ​​and generates questions and answers.

3. The system according to claim 1, which analyzes a user's instructions in natural language and provides an appropriate response based on those instructions.