system
The information processing device with a generative model allows users to easily generate automation scenarios by analyzing screen interface elements and ensuring compatibility, addressing the complexity of existing automation tools and enhancing operational efficiency.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Existing automation tools require advanced programming skills, making it difficult for end-users to create automation scenarios.
An information processing device that utilizes a generative model to generate automation scenarios based on user input data, analyzing screen interface elements and comparing them with reference information to ensure compatibility with automation tools, allowing users to easily construct automation processes without specialized knowledge.
Enables end-users to create accurate and compatible automation scenarios efficiently, improving operational efficiency and reducing the need for programming skills.
Smart Images

Figure 2026105407000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] When an end - user automates a complex process using an automation tool, advanced programming skills are required, which makes it difficult to create an automation scenario. Therefore, it is desired to provide a method by which an end - user can easily create an automation scenario.
Means for Solving the Problems
[0005] This invention provides an information processing device that holds all functional reference information for a specific automation tool and has means for receiving image and text data as input. Furthermore, it utilizes a generative model that generates automation scenarios based on this input data, and outputs the generated scenarios, thereby enabling end users to easily construct automation processes. This system achieves accurate scenario generation by analyzing elements of the screen interface, extracting appropriate information, and comparing it with reference information for the automation tool.
[0006] An "information processing device" is an electronic device that has the ability to receive, analyze, process, and output data.
[0007] An "input device" is a means used by a user to provide image or text data to a system.
[0008] A "generative model" is an algorithm or program that generates automation scenarios based on input data.
[0009] An "output device" is a means of communicating the generated automation scenario externally or presenting it to the user.
[0010] A "screen interface" is a digital display environment used by users to perform operations, and includes elements such as buttons and forms.
[0011] An "analysis device" is a means of analyzing input data and extracting useful information.
[0012] A "control device" is a means of comparing the generated scenario with reference information in an automation tool and correcting any inconsistencies. [Brief explanation of the drawing]
[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Mode for Carrying Out the Invention
[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described according to the accompanying drawings.
[0015] First, the terms used in the following description will be explained.
[0016] In the following embodiments, a labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0017] In the following embodiments, a labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0018] In the following embodiments, a labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0019] In the following embodiments, a labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.
[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0021] [First Embodiment]
[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0034] The system of this invention aims to allow users to easily input information about the processes they wish to automate and generate automation scenarios based on this input. This system functions through the collaboration of a server, a terminal, and the user.
[0035] First, the user uses a terminal to input screenshots of the system to be automated, HTML configuration information, and specific work procedures. This allows the user to provide detailed work information to the system as text and images, thereby consolidating the fundamental information needed for automation.
[0036] Next, the terminal functions as an interface for sending data received from the user to the server. The terminal performs data format conversion and transmission processing to ensure that the input data is reliably transmitted to the server.
[0037] The server performs information analysis on the received data. First, it analyzes the elements of the screen interface contained in the image data to understand the structure of the user interface and identify which parts are the target of user interaction. Furthermore, it identifies specific operation procedures from the text data and determines the action to take based on them.
[0038] Next, the server leverages the full functional reference information of the specific automation tool and applies a generative model to generate automation scenarios. This defines appropriate automation procedures on the system based on user input. The system then verifies whether the generated scenarios are compatible with the functionality of the specific automation tool and automatically makes any necessary corrections.
[0039] Finally, the generated automation scenario is presented to the user via a terminal. The user can review it, make modifications as needed, and then execute the final scenario. As a specific example, when automating a process such as order processing in accounting, the user inputs the operating procedures and form input flow of the order system into the system. Based on this information, the system automatically generates a scenario from displaying the form to confirming and completing the order, thereby improving operational efficiency.
[0040] Thus, the present invention aims to support end users in building automated processes, and its implementation enables the easy realization of complex automation without requiring programming skills.
[0041] The following describes the processing flow.
[0042] Step 1:
[0043] The user uses a terminal to input screenshots of the system operations to be automated, HTML configuration information, and specific work procedures. This provides the system with detailed data about the processes to be automated.
[0044] Step 2:
[0045] The terminal converts the input data received from the user into an appropriate format and prepares it for transmission to the server. In particular, the data is formatted to conform to the standards required by the system.
[0046] Step 3:
[0047] A request containing screenshots, HTML information, and data related to work procedures is sent from the terminal to the server. This transmission process ensures that the data is received without any loss or errors.
[0048] Step 4:
[0049] The server analyzes the received data and first uses image data to identify elements displayed on the screen. It then utilizes image recognition technology to identify UI components such as buttons and form fields.
[0050] Step 5:
[0051] The server analyzes the text data and extracts the specific operational steps required for the automation process. This procedural information becomes the foundational data used later to generate automation scenarios.
[0052] Step 6:
[0053] The server generates automation scenarios by applying a generative model while referencing functional reference data for specific automation tools. In this process, detected UI components and operation procedures are reflected in the automation scenarios.
[0054] Step 7:
[0055] The server verifies the suitability of the generated automation scenarios and confirms their compatibility with specific automation tools. If any inconsistencies are found, the scenario is automatically corrected.
[0056] Step 8:
[0057] The server returns the completed automation scenario to the terminal. This output data is used by the user to review and modify the scenario.
[0058] Step 9:
[0059] Users review the automation scenarios provided through their terminals, make adjustments as needed, and then actually execute the automated process. This enables the automation of business processes.
[0060] (Example 1)
[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0062] Creating automation scenarios requires a great deal of expertise, making them difficult for the average user to use efficiently. Furthermore, ensuring the proper extraction of user interfaces and operating procedures, and their compatibility with automation tools, is not easy. Therefore, there is a need to generate automation scenarios easily and accurately to improve operational efficiency.
[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0064] In this invention, the server includes a terminal for collecting process information from the user, communication means for transferring data to the server via the terminal, analysis means for analyzing the received data and identifying UI elements and operating procedures, and generation means for creating automation scenarios using an AI model generated from the analyzed data. This makes it possible for users to easily generate automation scenarios without specialized knowledge, improving operational efficiency while maintaining system compatibility.
[0065] A "user" is the entity that provides the system with the process information necessary to generate automation scenarios.
[0066] A "terminal" is a device or platform that receives input data from a user and transmits it to a server via a communication method.
[0067] A "server" is a central device that analyzes data received from terminals and creates and manages automation scenarios using generated AI models.
[0068] "Communication means" refers to protocols and interfaces used to transfer data from a terminal to a server, and possesses the function of sending and receiving data accurately and securely.
[0069] "Analysis means" refers to the processes and algorithms necessary to examine the received data in detail and identify user interface elements and operating procedures.
[0070] "Generation means" refers to the processes and functions used to construct automation scenarios using a generation AI model based on the analysis results.
[0071] A "generative AI model" is a program or system based on artificial intelligence technology that creates automation scenarios based on given prompts and information.
[0072] A "prompt" is a sentence in the form of an instruction or question that is input into a generating AI model, and it contains the information necessary to generate an automation scenario.
[0073] The system of this invention provides the user with the information necessary to automate a specific process, and generates a scenario based on that information. The system works in cooperation with the user, terminal, and server.
[0074] First, the user operates a terminal to supply the system with information related to the process they want to automate. This information mainly consists of screenshots, HTML configuration information, and manual operation instructions. This information is important for identifying the detailed process flow.
[0075] Next, the terminal receives information from the user, converts the data format, and then securely transmits it to the server. The data, now in the appropriate format, is used for analysis on the server.
[0076] The server receives the data sent from the terminal and begins analysis.
[0077] We use image analysis algorithms and natural language processing techniques to identify elements and operating procedures of the screen interface. Specifically, we use image processing libraries to extract UI elements from screenshots and a text analysis engine to extract operating procedures from linguistic data.
[0078] Next, the server generates an appropriate automation scenario using a specific generative AI model. This AI model interprets the information provided by the user as prompts and outputs the optimal scenario for the automation tool. Possible prompts include, for example, "Generate a scenario to automate the following process. Please refer to the screenshots and HTML information for instructions and complete the process."
[0079] The generated scenarios are evaluated for suitability and modified on the system if necessary. Finally, they are presented to the user via a terminal, allowing the user to review the content and make changes as needed. This entire process enables users to create automation scenarios tailored to their needs and improve work efficiency without requiring specialized programming knowledge.
[0080] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0081] Step 1:
[0082] The user enters screenshots of the process they want to automate, HTML configuration information, and specific operating instructions. This input data serves as foundational information to communicate the specific automation needs to the system. The screenshots show the UI layout, the HTML information identifies the page structure, and the operating instructions text represents the overall flow of the process.
[0083] Step 2:
[0084] The terminal processes various types of data collected from the user. Specifically, image data is converted to the appropriate resolution and format, and text data is encoded and converted into a format compatible with the server. The converted data is then sent to the server via the terminal's communication functions. The system is designed to maintain data integrity and security during this process.
[0085] Step 3:
[0086] The server analyzes screenshots and HTML information received from the terminal using an image analysis engine and an HTML analysis engine. Here, UI elements are extracted through image analysis, and the logical structure of the page is identified through HTML analysis. The output is a list indicating which elements are targeted by user interaction.
[0087] Step 4:
[0088] Next, the server extracts the operation steps from the text data and uses a natural language processing engine to identify each step. The extracted steps are then compared with a list of UI elements to determine the corresponding operation for each action. This is then converted into a prompt and used as input for the generative AI model.
[0089] Step 5:
[0090] The server inputs the constructed prompt statements into the AI model to generate an automation scenario. The AI model outputs the optimal scenario based on the input information. At this time, the suitability of the generated scenario is evaluated, taking into account all the functional reference information of the specific automation tool. If a mismatch is found, the scenario is modified and evaluated again.
[0091] Step 6:
[0092] The generated automation scenario is presented to the user via the terminal. The terminal provides a visual interface that shows the flow of the scenario, allowing the user to review its contents. The user accepts the scenario or requests modifications as needed, and the optimal automation scenario ultimately becomes executable.
[0093] (Application Example 1)
[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0095] In data management and operations, there is a need for operational support in specific environments and for the automation of anomaly detection and response actions. However, existing systems make it difficult to efficiently manage and rapidly automate these processes. Furthermore, operators often require programming skills, and without specialized knowledge, achieving complex operations is challenging.
[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0097] In this invention, the server includes a device that holds all functional reference information for a specific information processing device, input means for receiving image and text data as input, generation means for generating automation scenarios based on the received data, output means for outputting the generated scenarios, support means for providing operational support in a specific environment, and management means for detecting abnormalities in operational procedures and automating corresponding actions. This enables the automation of complex data management and operational processes, and allows for efficient operation without requiring operators to have programming skills.
[0098] An "information processing device" is a device that performs calculations and analyses using various types of data and executes programs to carry out specific functions.
[0099] "Input means" refers to devices or methods for acquiring information from users or external devices and processing it internally.
[0100] "Generation means" refers to methods or devices for creating new data or scenarios based on input information.
[0101] "Output means" refers to methods or devices for sending data generated within a system to an external source.
[0102] "Support measures" refer to methods or devices used to support and efficiently carry out specific tasks or processes.
[0103] "Management measures" refer to methods and devices for monitoring operational status, detecting abnormalities and problems, and taking appropriate action.
[0104] A system for implementing this invention includes an information processing device, an input device, a generation means, an output means, a support means, and a management means. The program of this system operates to effectively automate and support the operation of a data center.
[0105] The server receives image and text data provided by the user via an input device through a terminal. The input device converts this data into an analyzable format and transfers it to the server. The server uses an information processing device to utilize image and text analysis engines, and generates operational scenarios from various sensor information and system logs using a generation device.
[0106] Furthermore, the generated operational scenarios are presented to the user through an output device. The user can review the presented scenarios and make adjustments as needed. In this process, a generative AI model is applied and prompt statements are used to generate efficient scenarios. This generative AI model is particularly used for anomaly detection and optimization of intranet system compatibility.
[0107] As a concrete example, consider monitoring a cooling system within a data center and generating automated response scenarios. When a user inputs screenshots of the cooling system's dashboard and operation logs into the system, the server analyzes this data, detects anomalies, and automatically generates specific actions for the response process. This enables immediate system response and efficient operational management.
[0108] An example of a prompt message is: "Generate a scenario to detect an anomaly in the data center's cooling system and automatically respond to it. Based on the screenshots and instructions, define the corrective actions and make them executable."
[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0110] Step 1:
[0111] The user inputs screenshots of the data center's cooling system status taken with their smartphone, along with related text data (system logs and procedure manuals). The input data is formatted on the device and prepared for transmission to the server.
[0112] Step 2:
[0113] The terminal sends the formatted data to the server. The server receives it and uses an image data analysis engine to extract important elements contained in the screenshot. For example, the set temperature and operating status of the cooling system may be identified.
[0114] Step 3:
[0115] The text data received by the server is analyzed by a text analysis engine. The analysis identifies signs of anomalies and necessary operational procedures from the system logs. Based on this information, a generative AI model creates appropriate countermeasure scenarios.
[0116] Step 4:
[0117] The server uses a generative AI model to generate response scenarios based on specific prompt messages. These generated scenarios include specific actions necessary to immediately address cooling system anomalies. This generation process leverages machine learning techniques to define the optimal response strategy.
[0118] Step 5:
[0119] The server outputs the generated scenario and presents it to the user via a terminal. If the user reviews the scenario and determines it is feasible, they apply its contents to the data center's control system. This process efficiently manages any abnormalities in the cooling system.
[0120] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0121] This invention provides an automated system that recognizes user emotions and reflects that information in the automated process. The system functions through the collaborative efforts of a server, a terminal, and the user.
[0122] First, the user inputs information about the task they want to automate using the device. This includes data such as screenshots, HTML configuration information, and specific operating procedures. The device also incorporates an emotion engine to capture the user's emotional state, recognizing their emotions in real time.
[0123] The terminal functions as an interface for sending information received from the user to the server. Here, the terminal transmits the input data to the server in a properly formatted form, including emotional information obtained from the emotion engine.
[0124] The server analyzes the transmitted data and recognizes elements of the screen interface from the image data. Furthermore, it analyzes the text data to identify the steps of the automation process. During the analysis, the server considers the user's emotional information provided by the emotion engine and generates automation scenarios using a generative model. The system is designed so that the user's emotions influence the selection and flow of the scenarios.
[0125] For example, if a user exhibits an unpleasant emotion during a task, the system can select a specific scenario flow to mitigate that emotion. Conversely, if a positive emotion is detected, the system can adjust the process to select steps that allow for more efficient progress.
[0126] Finally, the server sends the generated automation scenario back to the terminal. The user can review this scenario, make any necessary modifications, and then execute it. This ensures that the automation process unfolds in a way that takes the user's feelings into consideration, resulting in a more comfortable user experience.
[0127] The following describes the processing flow.
[0128] Step 1:
[0129] The user uses the device to input screenshots, HTML configuration information, and specific work steps for the tasks they wish to automate. Additionally, the device's built-in emotion engine performs facial recognition and voice analysis to detect emotions in real time.
[0130] Step 2:
[0131] The terminal consolidates all user input into a single data package. This package includes screenshot data, HTML information, work procedure text, and sentiment information obtained from the sentiment engine.
[0132] Step 3:
[0133] The terminal prepares to send the created data package to the server. It verifies that the data format is correct and sends the request to the server according to the transmission protocol.
[0134] Step 4:
[0135] The server analyzes the data received from the terminal and uses image recognition technology to extract screen interface elements from the screenshot. In this process, buttons, input fields, and menus are identified.
[0136] Step 5:
[0137] The server performs text analysis to identify the specific operations required for automation from the provided work procedures. It also understands the user's intent based on the analyzed data.
[0138] Step 6:
[0139] The server aggregates input from the emotion engine to identify the user's emotional state. This takes into account the user's stress level, concentration level, and emotional tone.
[0140] Step 7:
[0141] The server generates automation scenarios by applying a generative model using a functional reference of a specific automation tool and user sentiment information. During this process, it adjusts specific scenario flows and actions based on the user's sentiment.
[0142] Step 8:
[0143] The server verifies the integrity of the generated automation scenarios and automatically makes corrections if necessary. In particular, it ensures that the scenarios are preserved in a way that is appropriate to the user's emotions.
[0144] Step 9:
[0145] The server sends the generated automation scenario back to the terminal. The user reviews this scenario on the terminal, evaluates and modifies the actions, and then executes them. This enables an automation process that reflects the user's emotional state.
[0146] (Example 2)
[0147] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0148] Traditional automation systems often suffer from decreased user satisfaction and efficiency because they proceed with processes without considering the emotional state of the user. Furthermore, they frequently adopt a uniform approach, failing to adequately account for variations in automation requirements among different users.
[0149] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0150] In this invention, the server includes emotion analysis means for recognizing the user's emotional state and reflecting that information in processing, means for adjusting the operation flow based on the emotional state, and information recording means for holding all functional reference information of a specific information processing tool. This makes it possible to provide a flexible automated process that takes the user's emotions into consideration.
[0151] An "emotion analysis device" is a technological device that recognizes the emotional state of a user and incorporates that information into its processing.
[0152] A "means for adjusting the operation flow" refers to a technical device for modifying and optimizing the progress of a process based on the user's emotional state.
[0153] An "information recording means" is a technical device that holds and manages all functional reference information for a specific information processing tool, making it accessible as needed.
[0154] A "generation means" is a technical device for creating an automated process based on received video and text data.
[0155] "Output means" refers to a technological device that presents or provides the generated automated process to an external party.
[0156] "Analysis means" refers to a technical device that analyzes screen display elements contained in input video data and extracts information necessary for generating an automated process.
[0157] A "correction tool" is a technical device that corrects unsuitable parts of an automated process based on the analysis results and improves it into an appropriate process.
[0158] This invention is a system that generates and executes automated processes while taking user emotions into consideration. The system functions through the cooperation of a server, terminals, and users.
[0159] The user first uses the terminal to input information about the task they want to automate. This input includes screenshots, HTML structure information, and specific operating procedures. Furthermore, the terminal is equipped with emotion analysis software to recognize the user's emotions in real time. This allows for the analysis of the user's facial expressions and voice.
[0160] The terminal functions as an interface for sending information received from the user to the server. The terminal appropriately formats the input data and sends it to the server along with sentiment information obtained from the sentiment analysis engine. Standard communication protocols are used for this transmission.
[0161] The server analyzes the received data and generates automated processes using a generative AI model. It recognizes screen interface elements from image data and identifies automation steps from text data. Machine learning algorithms and image recognition technologies are used for these analyses. Furthermore, the generated scenarios are adjusted so that the user's emotional state is reflected in the process selection and progression. For example, if the user is stressed, the system will suggest intuitive and easy-to-understand steps.
[0162] The generated automation scenario is sent back to the user via their device. The user can review the scenario, make any necessary modifications, and then execute it. This enables a comfortable and efficient automation experience for the user.
[0163] For example, if you want to automate the photo editing process, the user inputs the editing steps. For instance, a prompt such as, "Please suggest steps that will allow the user to relax while automating the photo editing process," could be used. This prompt is input into a generative AI model, which then generates an automated procedure that suits the user's needs.
[0164] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0165] Step 1:
[0166] The user inputs information about the task they want to automate using a terminal. This input data includes screenshots, HTML structure information, and detailed operating instructions. This prepares the basic data for the user's desired output. The terminal also has an emotion analysis engine that acquires the user's emotional state in real time from their facial expressions and voice.
[0167] Step 2:
[0168] The terminal organizes the information received from the user and sends the data to the server. User sentiment information is also sent along with the data. Data formatting is performed to ensure the input data is in the correct format. This prepares the server for analysis.
[0169] Step 3:
[0170] The server receives data transmitted from the terminal. For image data, image analysis techniques are used to recognize elements of the screen interface. For text data, natural language processing is used to identify the operation procedures required by the automation engine. Through these analyses, the user's input data is prepared as detailed information for the next processing stage.
[0171] Step 4:
[0172] The server generates automated processes using a generative AI model. Based on the analyzed data and user sentiment information, it generates the most suitable automation scenario. For example, if the user's sentiment information is negative, the system will suggest steps to reduce the user's burden. This ensures that the scenario reflects a flow appropriate to the user's state.
[0173] Step 5:
[0174] The server sends the generated automation scenario to the terminal. The terminal is then ready to proceed to the next step by displaying this scenario to the user.
[0175] Step 6:
[0176] The user reviews the automation scenario displayed on the terminal and makes modifications as needed. The final automation process is confirmed by accepting the user's input and modifications. The user can then execute the confirmed process, thereby achieving the desired results.
[0177] (Application Example 2)
[0178] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0179] In the field of online shopping, presenting products without considering user emotions can damage the user experience and reduce their desire to purchase. Furthermore, the inability to flexibly respond to the diverse emotions and needs of users makes it difficult to improve user satisfaction, which can ultimately lead to lost sales opportunities.
[0180] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0181] In this invention, the server includes a device for recognizing the user's mental state and reflecting it in an automated process, and an output device for providing specific product information that is generated and includes a generative model for optimizing product presentation based on the user's emotions. This enables optimal product suggestions based on the user's emotions and improves the user experience.
[0182] "User's mental state" refers to the user's emotions and psychological state, and is the information necessary to adjust processes and product presentations based on that state.
[0183] A "device for reflecting information in an automated process" is a device that has the necessary functions to recognize the user's mental state and reflect it in the process.
[0184] A "generative model" is an AI-powered model that optimizes product presentations based on user emotions and generates specific product information.
[0185] An "output device" is a device necessary to present specific product information that has been generated to the customer.
[0186] An "analysis device" is a device that analyzes input data, extracts user emotions, and uses that information to help present products.
[0187] A "control device" is a device that has the function of suggesting alternative products to match the generated product information to the user's purchasing behavior, thereby improving user satisfaction.
[0188] This invention provides a system that recognizes the user's mental state and optimizes product presentation during online shopping.
[0189] First, users access the e-commerce platform through an application installed on their smartphone or smart glasses. The user's device incorporates an emotion recognition engine that analyzes the user's facial expressions and voice in real time to determine their emotional state. This emotion data is then transmitted to a server via the internet.
[0190] The server uses a generative AI model based on the received emotional data to present the most suitable products to the user. This model performs product recommendations that correspond to the user's emotions and generates specific product information. In this process, the server uses programming languages such as Python and Node.js to process the data.
[0191] The generated product information is sent to the user's device, allowing the user to view product details and make a purchase decision as needed. In particular, if the user expresses negative emotions, the server will provide relevant alternative products or supplementary information to improve the user experience.
[0192] For example, if a user expresses frustration while browsing a specific product page, the system automatically provides reviews and FAQs for that product to resolve their questions. If the user expresses positive emotions, the system suggests bundled purchase options for similar products to encourage further purchases.
[0193] An example of a prompt message to achieve this is: "The user expressed emotions (A, C, G) while viewing a product page. Since the emotion is primarily (A), generate a program and description that suggests related products."
[0194] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0195] Step 1:
[0196] The user launches a shopping application via their smartphone or smart glasses. The user's device has an emotion recognition engine installed, which uses the camera and microphone to collect emotional data from the user's facial expressions and voice. Based on this sensor input, the device analyzes the user's mental state in real time and outputs the emotions in digital format.
[0197] Step 2:
[0198] The device transmits collected emotional data and product information viewed by the user to the server via the internet. Input includes data about the user's mental state and identifying information such as product IDs. The device formats this data appropriately and transmits it in a format easily processed by the server.
[0199] Step 3:
[0200] The server analyzes the received sentiment data and product information, and uses a generative AI model to suggest the most suitable products to the user. Based on the sentiment data input, the generative AI model executes a recommendation algorithm and generates a list of highly relevant products. Product information is then generated to be presented to the user as output.
[0201] Step 4:
[0202] The server sends the generated product information back to the terminal. In this step, the product information is packaged in a format that is immediately usable by the user and sent quickly.
[0203] Step 5:
[0204] The device displays the received product information on the application screen. Through the screen, the user can view detailed product information and related reviews to make a purchase decision. The application then analyzes the user's response again using an emotion recognition engine and, if necessary, provides further product suggestions or information.
[0205] This series of processes enables a personalized shopping experience based on the user's emotions.
[0206] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0207] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0208] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0209] [Second Embodiment]
[0210] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0211] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0212] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0213] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0214] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0215] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0216] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0217] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0218] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0219] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0220] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0221] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0222] The system of this invention aims to allow users to easily input information about the processes they wish to automate and generate automation scenarios based on this input. This system functions through the collaboration of a server, a terminal, and the user.
[0223] First, the user uses a terminal to input screenshots of the system to be automated, HTML configuration information, and specific work procedures. This allows the user to provide detailed work information to the system as text and images, thereby consolidating the fundamental information needed for automation.
[0224] Next, the terminal functions as an interface for sending data received from the user to the server. The terminal performs data format conversion and transmission processing to ensure that the input data is reliably transmitted to the server.
[0225] The server performs information analysis on the received data. First, it analyzes the elements of the screen interface contained in the image data to understand the structure of the user interface and identify which parts are the target of user interaction. Furthermore, it identifies specific operation procedures from the text data and determines the action to take based on them.
[0226] Next, the server leverages the full functional reference information of the specific automation tool and applies a generative model to generate automation scenarios. This defines appropriate automation procedures on the system based on user input. The system then verifies whether the generated scenarios are compatible with the functionality of the specific automation tool and automatically makes any necessary corrections.
[0227] Finally, the generated automation scenario is presented to the user via a terminal. The user can review it, make modifications as needed, and then execute the final scenario. As a specific example, when automating a process such as order processing in accounting, the user inputs the operating procedures and form input flow of the order system into the system. Based on this information, the system automatically generates a scenario from displaying the form to confirming and completing the order, thereby improving operational efficiency.
[0228] Thus, the present invention aims to support end users in building automated processes, and its implementation enables the easy realization of complex automation without requiring programming skills.
[0229] The following describes the processing flow.
[0230] Step 1:
[0231] The user uses a terminal to input screenshots of the system operations to be automated, HTML configuration information, and specific work procedures. This provides the system with detailed data about the processes to be automated.
[0232] Step 2:
[0233] The terminal converts the input data received from the user into an appropriate format and prepares it for transmission to the server. In particular, the data is formatted to conform to the standards required by the system.
[0234] Step 3:
[0235] A request containing screenshots, HTML information, and data related to work procedures is sent from the terminal to the server. This transmission process ensures that the data is received without any loss or errors.
[0236] Step 4:
[0237] The server analyzes the received data and first uses image data to identify elements displayed on the screen. It then utilizes image recognition technology to identify UI components such as buttons and form fields.
[0238] Step 5:
[0239] The server analyzes the text data and extracts the specific operational steps required for the automation process. This procedural information becomes the foundational data used later to generate automation scenarios.
[0240] Step 6:
[0241] The server generates automation scenarios by applying a generative model while referencing functional reference data for specific automation tools. In this process, detected UI components and operation procedures are reflected in the automation scenarios.
[0242] Step 7:
[0243] The server verifies the suitability of the generated automation scenarios and confirms their compatibility with specific automation tools. If any inconsistencies are found, the scenario is automatically corrected.
[0244] Step 8:
[0245] The server returns the completed automation scenario to the terminal. This output data is used by the user to review and modify the scenario.
[0246] Step 9:
[0247] Users review the automation scenarios provided through their terminals, make adjustments as needed, and then actually execute the automated process. This enables the automation of business processes.
[0248] (Example 1)
[0249] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0250] Creating automation scenarios requires a great deal of expertise, making them difficult for the average user to use efficiently. Furthermore, ensuring the proper extraction of user interfaces and operating procedures, and their compatibility with automation tools, is not easy. Therefore, there is a need to generate automation scenarios easily and accurately to improve operational efficiency.
[0251] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0252] In this invention, the server includes a terminal for collecting process information from the user, communication means for transferring data to the server via the terminal, analysis means for analyzing the received data and identifying UI elements and operating procedures, and generation means for creating automation scenarios using an AI model generated from the analyzed data. This makes it possible for users to easily generate automation scenarios without specialized knowledge, improving operational efficiency while maintaining system compatibility.
[0253] A "user" is the entity that provides the system with the process information necessary to generate automation scenarios.
[0254] A "terminal" is a device or platform that receives input data from a user and transmits it to a server via a communication method.
[0255] A "server" is a central device that analyzes data received from terminals and creates and manages automation scenarios using generated AI models.
[0256] "Communication means" refers to protocols and interfaces used to transfer data from a terminal to a server, and possesses the function of sending and receiving data accurately and securely.
[0257] "Analysis means" refers to the processes and algorithms necessary to examine the received data in detail and identify user interface elements and operating procedures.
[0258] "Generation means" refers to the processes and functions used to construct automation scenarios using a generation AI model based on the analysis results.
[0259] A "generative AI model" is a program or system based on artificial intelligence technology that creates automation scenarios based on given prompts and information.
[0260] A "prompt" is a sentence in the form of an instruction or question that is input into a generating AI model, and it contains the information necessary to generate an automation scenario.
[0261] The system of this invention provides the user with the information necessary to automate a specific process, and generates a scenario based on that information. The system works in cooperation with the user, terminal, and server.
[0262] First, the user operates a terminal to supply the system with information related to the process they want to automate. This information mainly consists of screenshots, HTML configuration information, and manual operation instructions. This information is important for identifying the detailed process flow.
[0263] Next, the terminal receives information from the user, converts the data format, and then securely transmits it to the server. The data, now in the appropriate format, is used for analysis on the server.
[0264] The server receives the data sent from the terminal and begins analysis.
[0265] We use image analysis algorithms and natural language processing techniques to identify elements and operating procedures of the screen interface. Specifically, we use image processing libraries to extract UI elements from screenshots and a text analysis engine to extract operating procedures from linguistic data.
[0266] Next, the server generates an appropriate automation scenario using a specific generative AI model. This AI model interprets the information provided by the user as prompts and outputs the optimal scenario for the automation tool. Possible prompts include, for example, "Generate a scenario to automate the following process. Please refer to the screenshots and HTML information for instructions and complete the process."
[0267] The generated scenarios are evaluated for suitability and modified on the system if necessary. Finally, they are presented to the user via a terminal, allowing the user to review the content and make changes as needed. This entire process enables users to create automation scenarios tailored to their needs and improve work efficiency without requiring specialized programming knowledge.
[0268] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0269] Step 1:
[0270] The user enters screenshots of the process they want to automate, HTML configuration information, and specific operating instructions. This input data serves as foundational information to communicate the specific automation needs to the system. The screenshots show the UI layout, the HTML information identifies the page structure, and the operating instructions text represents the overall flow of the process.
[0271] Step 2:
[0272] The terminal processes various types of data collected from the user. Specifically, image data is converted to the appropriate resolution and format, and text data is encoded and converted into a format compatible with the server. The converted data is then sent to the server via the terminal's communication functions. The system is designed to maintain data integrity and security during this process.
[0273] Step 3:
[0274] The server analyzes screenshots and HTML information received from the terminal using an image analysis engine and an HTML analysis engine. Here, UI elements are extracted through image analysis, and the logical structure of the page is identified through HTML analysis. The output is a list indicating which elements are targeted by user interaction.
[0275] Step 4:
[0276] Next, the server extracts the operation steps from the text data and uses a natural language processing engine to identify each step. The extracted steps are then compared with a list of UI elements to determine the corresponding operation for each action. This is then converted into a prompt and used as input for the generative AI model.
[0277] Step 5:
[0278] The server inputs the constructed prompt statements into the AI model to generate an automation scenario. The AI model outputs the optimal scenario based on the input information. At this time, the suitability of the generated scenario is evaluated, taking into account all the functional reference information of the specific automation tool. If a mismatch is found, the scenario is modified and evaluated again.
[0279] Step 6:
[0280] The generated automation scenario is presented to the user via the terminal. The terminal provides a visual interface that shows the flow of the scenario, allowing the user to review its contents. The user accepts the scenario or requests modifications as needed, and the optimal automation scenario ultimately becomes executable.
[0281] (Application Example 1)
[0282] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0283] In data management and operations, there is a need for operational support in specific environments and for the automation of anomaly detection and response actions. However, existing systems make it difficult to efficiently manage and rapidly automate these processes. Furthermore, operators often require programming skills, and without specialized knowledge, achieving complex operations is challenging.
[0284] The specific processing by the specific processing unit 290 of the data processing apparatus 12 in Application Example 1 is realized by the following means respectively.
[0285] In this invention, the server includes: a device that holds all-function reference information of a specific information processing apparatus; an input means for receiving images and text data as inputs; a generation means for generating an automation scenario based on the received data; an output means for outputting the generated scenario; a support means for providing operation support in a specific environment; and a management means for detecting anomalies in the operation procedure and automating corresponding actions. Thereby, complex data management and automation of operation processes become possible, enabling efficient operation without the need for operators to have programming skills.
[0286] An "information processing apparatus" is a device that performs calculations and analyzes using various data and executes a program for accomplishing a specific function.
[0287] An "input means" is a device or method for acquiring information from a user or an external device and processing it internally.
[0288] A "generation means" is a method or device for creating new data or scenarios based on the input information.
[0289] An "output means" is a method or device for sending out data generated within the system to the outside.
[0290] A "support means" is a method or device used to support a specific task or process and perform it efficiently.
[0291] A "management means" is a method or device for monitoring the operation status, detecting anomalies and problems, and taking appropriate actions.
[0292] A system for implementing this invention includes an information processing device, an input device, a generation means, an output means, a support means, and a management means. The program of this system operates to effectively automate and support the operation of a data center.
[0293] The server receives image and text data provided by the user via an input device through a terminal. The input device converts this data into an analyzable format and transfers it to the server. The server uses an information processing device to utilize image and text analysis engines, and generates operational scenarios from various sensor information and system logs using a generation device.
[0294] Furthermore, the generated operational scenarios are presented to the user through an output device. The user can review the presented scenarios and make adjustments as needed. In this process, a generative AI model is applied and prompt statements are used to generate efficient scenarios. This generative AI model is particularly used for anomaly detection and optimization of intranet system compatibility.
[0295] As a concrete example, consider monitoring a cooling system within a data center and generating automated response scenarios. When a user inputs screenshots of the cooling system's dashboard and operation logs into the system, the server analyzes this data, detects anomalies, and automatically generates specific actions for the response process. This enables immediate system response and efficient operational management.
[0296] An example of a prompt message is: "Generate a scenario to detect an anomaly in the data center's cooling system and automatically respond to it. Based on the screenshots and instructions, define the corrective actions and make them executable."
[0297] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0298] Step 1:
[0299] The user inputs a screenshot taken using a smartphone of the status of the cooling system in the data center and related text data (system logs, procedure manuals). The input data is format-converted on the terminal and is ready to be sent to the server.
[0300] Step 2:
[0301] The terminal sends the format-converted data to the server. The server receives this and uses an image data analysis engine to extract important elements contained in the screenshot. For example, the set temperature and operating status of the cooling system are identified.
[0302] Step 3:
[0303] The text data received by the server is analyzed by a text analysis engine. As a result of the analysis, signs of abnormalities and the operation procedures to be carried out are identified from the system logs. Based on this information, the generative AI model creates appropriate countermeasure scenarios.
[0304] Step 4:
[0305] The server uses the generative AI model to generate a countermeasure scenario based on a specific prompt sentence. The generated scenario includes the specific actions necessary to respond immediately to the abnormality of the cooling system. In this generation process, machine learning techniques are utilized to define an optimal response strategy.
[0306] Step 5:
[0307] The server outputs the generated scenario and presents it to the user through the terminal. If the user confirms the scenario and determines it to be feasible, the content is applied to the control system of the data center. By this operation, the abnormality of the cooling system is efficiently managed.
[0308] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0309] This invention provides an automated system that recognizes user emotions and reflects that information in the automated process. The system functions through the collaborative efforts of a server, a terminal, and the user.
[0310] First, the user inputs information about the task they want to automate using the device. This includes data such as screenshots, HTML configuration information, and specific operating procedures. The device also incorporates an emotion engine to capture the user's emotional state, recognizing their emotions in real time.
[0311] The terminal functions as an interface for sending information received from the user to the server. Here, the terminal transmits the input data to the server in a properly formatted form, including emotional information obtained from the emotion engine.
[0312] The server analyzes the transmitted data and recognizes elements of the screen interface from the image data. Furthermore, it analyzes the text data to identify the steps of the automation process. During the analysis, the server considers the user's emotional information provided by the emotion engine and generates automation scenarios using a generative model. The system is designed so that the user's emotions influence the selection and flow of the scenarios.
[0313] For example, if a user exhibits an unpleasant emotion during a task, the system can select a specific scenario flow to mitigate that emotion. Conversely, if a positive emotion is detected, the system can adjust the process to select steps that allow for more efficient progress.
[0314] Finally, the server sends the generated automation scenario back to the terminal. The user can review this scenario, make any necessary modifications, and then execute it. This ensures that the automation process unfolds in a way that takes the user's feelings into consideration, resulting in a more comfortable user experience.
[0315] The following describes the processing flow.
[0316] Step 1:
[0317] The user uses the device to input screenshots, HTML configuration information, and specific work steps for the tasks they wish to automate. Additionally, the device's built-in emotion engine performs facial recognition and voice analysis to detect emotions in real time.
[0318] Step 2:
[0319] The terminal consolidates all user input into a single data package. This package includes screenshot data, HTML information, work procedure text, and sentiment information obtained from the sentiment engine.
[0320] Step 3:
[0321] The terminal prepares to send the created data package to the server. It verifies that the data format is correct and sends the request to the server according to the transmission protocol.
[0322] Step 4:
[0323] The server analyzes the data received from the terminal and uses image recognition technology to extract screen interface elements from the screenshot. In this process, buttons, input fields, and menus are identified.
[0324] Step 5:
[0325] The server performs text analysis to identify the specific operations required for automation from the provided work procedures. It also understands the user's intent based on the analyzed data.
[0326] Step 6:
[0327] The server aggregates input from the emotion engine to identify the user's emotional state. This takes into account the user's stress level, concentration level, and emotional tone.
[0328] Step 7:
[0329] The server generates automation scenarios by applying a generative model using a functional reference of a specific automation tool and user sentiment information. During this process, it adjusts specific scenario flows and actions based on the user's sentiment.
[0330] Step 8:
[0331] The server verifies the integrity of the generated automation scenarios and automatically makes corrections if necessary. In particular, it ensures that the scenarios are preserved in a way that is appropriate to the user's emotions.
[0332] Step 9:
[0333] The server sends the generated automation scenario back to the terminal. The user reviews this scenario on the terminal, evaluates and modifies the actions, and then executes them. This enables an automation process that reflects the user's emotional state.
[0334] (Example 2)
[0335] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0336] Traditional automation systems often suffer from decreased user satisfaction and efficiency because they proceed with processes without considering the emotional state of the user. Furthermore, they frequently adopt a uniform approach, failing to adequately account for variations in automation requirements among different users.
[0337] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0338] In this invention, the server includes emotion analysis means for recognizing the user's emotional state and reflecting that information in processing, means for adjusting the operation flow based on the emotional state, and information recording means for holding all functional reference information of a specific information processing tool. This makes it possible to provide a flexible automated process that takes the user's emotions into consideration.
[0339] An "emotion analysis device" is a technological device that recognizes the emotional state of a user and incorporates that information into its processing.
[0340] A "means for adjusting the operation flow" refers to a technical device for modifying and optimizing the progress of a process based on the user's emotional state.
[0341] An "information recording means" is a technical device that holds and manages all functional reference information for a specific information processing tool, making it accessible as needed.
[0342] A "generation means" is a technical device for creating an automated process based on received video and text data.
[0343] "Output means" refers to a technological device that presents or provides the generated automated process to an external party.
[0344] "Analysis means" refers to a technical device that analyzes screen display elements contained in input video data and extracts information necessary for generating an automated process.
[0345] A "correction tool" is a technical device that corrects unsuitable parts of an automated process based on the analysis results and improves it into an appropriate process.
[0346] This invention is a system that generates and executes automated processes while taking user emotions into consideration. The system functions through the cooperation of a server, terminals, and users.
[0347] The user first uses the terminal to input information about the task they want to automate. This input includes screenshots, HTML structure information, and specific operating procedures. Furthermore, the terminal is equipped with emotion analysis software to recognize the user's emotions in real time. This allows for the analysis of the user's facial expressions and voice.
[0348] The terminal functions as an interface for sending information received from the user to the server. The terminal appropriately formats the input data and sends it to the server along with sentiment information obtained from the sentiment analysis engine. Standard communication protocols are used for this transmission.
[0349] The server analyzes the received data and generates automated processes using a generative AI model. It recognizes screen interface elements from image data and identifies automation steps from text data. Machine learning algorithms and image recognition technologies are used for these analyses. Furthermore, the generated scenarios are adjusted so that the user's emotional state is reflected in the process selection and progression. For example, if the user is stressed, the system will suggest intuitive and easy-to-understand steps.
[0350] The generated automation scenario is sent back to the user via their device. The user can review the scenario, make any necessary modifications, and then execute it. This enables a comfortable and efficient automation experience for the user.
[0351] For example, if you want to automate the photo editing process, the user inputs the editing steps. For instance, a prompt such as, "Please suggest steps that will allow the user to relax while automating the photo editing process," could be used. This prompt is input into a generative AI model, which then generates an automated procedure that suits the user's needs.
[0352] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0353] Step 1:
[0354] The user inputs information about the task they want to automate using a terminal. This input data includes screenshots, HTML structure information, and detailed operating instructions. This prepares the basic data for the user's desired output. The terminal also has an emotion analysis engine that acquires the user's emotional state in real time from their facial expressions and voice.
[0355] Step 2:
[0356] The terminal organizes the information received from the user and sends the data to the server. User sentiment information is also sent along with the data. Data formatting is performed to ensure the input data is in the correct format. This prepares the server for analysis.
[0357] Step 3:
[0358] The server receives data transmitted from the terminal. For image data, image analysis techniques are used to recognize elements of the screen interface. For text data, natural language processing is used to identify the operation procedures required by the automation engine. Through these analyses, the user's input data is prepared as detailed information for the next processing stage.
[0359] Step 4:
[0360] The server generates automated processes using a generative AI model. Based on the analyzed data and user sentiment information, it generates the most suitable automation scenario. For example, if the user's sentiment information is negative, the system will suggest steps to reduce the user's burden. This ensures that the scenario reflects a flow appropriate to the user's state.
[0361] Step 5:
[0362] The server sends the generated automation scenario to the terminal. The terminal is then ready to proceed to the next step by displaying this scenario to the user.
[0363] Step 6:
[0364] The user reviews the automation scenario displayed on the terminal and makes modifications as needed. The final automation process is confirmed by accepting the user's input and modifications. The user can then execute the confirmed process, thereby achieving the desired results.
[0365] (Application Example 2)
[0366] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0367] In the field of online shopping, presenting products without considering user emotions can damage the user experience and reduce their desire to purchase. Furthermore, the inability to flexibly respond to the diverse emotions and needs of users makes it difficult to improve user satisfaction, which can ultimately lead to lost sales opportunities.
[0368] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0369] In this invention, the server includes a device for recognizing the user's mental state and reflecting it in an automated process, and an output device for providing specific product information that is generated and includes a generative model for optimizing product presentation based on the user's emotions. This enables optimal product suggestions based on the user's emotions and improves the user experience.
[0370] "User's mental state" refers to the user's emotions and psychological state, and is the information necessary to adjust processes and product presentations based on that state.
[0371] A "device for reflecting information in an automated process" is a device that has the necessary functions to recognize the user's mental state and reflect it in the process.
[0372] A "generative model" is an AI-powered model that optimizes product presentations based on user emotions and generates specific product information.
[0373] An "output device" is a device necessary to present specific product information that has been generated to the customer.
[0374] An "analysis device" is a device that analyzes input data, extracts user emotions, and uses that information to help present products.
[0375] A "control device" is a device that has the function of suggesting alternative products to match the generated product information to the user's purchasing behavior, thereby improving user satisfaction.
[0376] This invention provides a system that recognizes the user's mental state and optimizes product presentation during online shopping.
[0377] First, users access the e-commerce platform through an application installed on their smartphone or smart glasses. The user's device incorporates an emotion recognition engine that analyzes the user's facial expressions and voice in real time to determine their emotional state. This emotion data is then transmitted to a server via the internet.
[0378] The server uses a generative AI model based on the received emotional data to present the most suitable products to the user. This model performs product recommendations that correspond to the user's emotions and generates specific product information. In this process, the server uses programming languages such as Python and Node.js to process the data.
[0379] The generated product information is sent to the user's device, allowing the user to view product details and make a purchase decision as needed. In particular, if the user expresses negative emotions, the server will provide relevant alternative products or supplementary information to improve the user experience.
[0380] For example, if a user expresses frustration while browsing a specific product page, the system automatically provides reviews and FAQs for that product to resolve their questions. If the user expresses positive emotions, the system suggests bundled purchase options for similar products to encourage further purchases.
[0381] An example of a prompt message to achieve this is: "The user expressed emotions (A, C, G) while viewing a product page. Since the emotion is primarily (A), generate a program and description that suggests related products."
[0382] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0383] Step 1:
[0384] The user launches a shopping application via their smartphone or smart glasses. The user's device has an emotion recognition engine installed, which uses the camera and microphone to collect emotional data from the user's facial expressions and voice. Based on this sensor input, the device analyzes the user's mental state in real time and outputs the emotions in digital format.
[0385] Step 2:
[0386] The device transmits collected emotional data and product information viewed by the user to the server via the internet. Input includes data about the user's mental state and identifying information such as product IDs. The device formats this data appropriately and transmits it in a format easily processed by the server.
[0387] Step 3:
[0388] The server analyzes the received sentiment data and product information, and uses a generative AI model to suggest the most suitable products to the user. Based on the sentiment data input, the generative AI model executes a recommendation algorithm and generates a list of highly relevant products. Product information is then generated to be presented to the user as output.
[0389] Step 4:
[0390] The server sends the generated product information back to the terminal. In this step, the product information is packaged in a format that is immediately usable by the user and sent quickly.
[0391] Step 5:
[0392] The device displays the received product information on the application screen. Through the screen, the user can view detailed product information and related reviews to make a purchase decision. The application then analyzes the user's response again using an emotion recognition engine and, if necessary, provides further product suggestions or information.
[0393] This series of processes enables a personalized shopping experience based on the user's emotions.
[0394] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0395] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0396] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0397] [Third Embodiment]
[0398] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0399] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0400] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0401] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0402] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0403] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0404] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0405] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0406] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0407] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0408] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0409] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0410] The system of this invention aims to allow users to easily input information about the processes they wish to automate and generate automation scenarios based on this input. This system functions through the collaboration of a server, a terminal, and the user.
[0411] First, the user uses a terminal to input screenshots of the system to be automated, HTML configuration information, and specific work procedures. This allows the user to provide detailed work information to the system as text and images, thereby consolidating the fundamental information needed for automation.
[0412] Next, the terminal functions as an interface for sending data received from the user to the server. The terminal performs data format conversion and transmission processing to ensure that the input data is reliably transmitted to the server.
[0413] The server performs information analysis on the received data. First, it analyzes the elements of the screen interface contained in the image data to understand the structure of the user interface and identify which parts are the target of user interaction. Furthermore, it identifies specific operation procedures from the text data and determines the action to take based on them.
[0414] Next, the server leverages the full functional reference information of the specific automation tool and applies a generative model to generate automation scenarios. This defines appropriate automation procedures on the system based on user input. The system then verifies whether the generated scenarios are compatible with the functionality of the specific automation tool and automatically makes any necessary corrections.
[0415] Finally, the generated automation scenario is presented to the user via a terminal. The user can review it, make modifications as needed, and then execute the final scenario. As a specific example, when automating a process such as order processing in accounting, the user inputs the operating procedures and form input flow of the order system into the system. Based on this information, the system automatically generates a scenario from displaying the form to confirming and completing the order, thereby improving operational efficiency.
[0416] Thus, the present invention aims to support end users in building automated processes, and its implementation enables the easy realization of complex automation without requiring programming skills.
[0417] The following describes the processing flow.
[0418] Step 1:
[0419] The user uses a terminal to input screenshots of the system operations to be automated, HTML configuration information, and specific work procedures. This provides the system with detailed data about the processes to be automated.
[0420] Step 2:
[0421] The terminal converts the input data received from the user into an appropriate format and prepares it for transmission to the server. In particular, the data is formatted to conform to the standards required by the system.
[0422] Step 3:
[0423] A request containing screenshots, HTML information, and data related to work procedures is sent from the terminal to the server. This transmission process ensures that the data is received without any loss or errors.
[0424] Step 4:
[0425] The server analyzes the received data and first uses image data to identify elements displayed on the screen. It then utilizes image recognition technology to identify UI components such as buttons and form fields.
[0426] Step 5:
[0427] The server analyzes the text data and extracts the specific operational steps required for the automation process. This procedural information becomes the foundational data used later to generate automation scenarios.
[0428] Step 6:
[0429] The server generates automation scenarios by applying a generative model while referencing functional reference data for specific automation tools. In this process, detected UI components and operation procedures are reflected in the automation scenarios.
[0430] Step 7:
[0431] The server verifies the suitability of the generated automation scenarios and confirms their compatibility with specific automation tools. If any inconsistencies are found, the scenario is automatically corrected.
[0432] Step 8:
[0433] The server returns the completed automation scenario to the terminal. This output data is used by the user to review and modify the scenario.
[0434] Step 9:
[0435] Users review the automation scenarios provided through their terminals, make adjustments as needed, and then actually execute the automated process. This enables the automation of business processes.
[0436] (Example 1)
[0437] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0438] Creating automation scenarios requires a great deal of expertise, making them difficult for the average user to use efficiently. Furthermore, ensuring the proper extraction of user interfaces and operating procedures, and their compatibility with automation tools, is not easy. Therefore, there is a need to generate automation scenarios easily and accurately to improve operational efficiency.
[0439] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0440] In this invention, the server includes a terminal for collecting process information from the user, communication means for transferring data to the server via the terminal, analysis means for analyzing the received data and identifying UI elements and operating procedures, and generation means for creating automation scenarios using an AI model generated from the analyzed data. This makes it possible for users to easily generate automation scenarios without specialized knowledge, improving operational efficiency while maintaining system compatibility.
[0441] A "user" is the entity that provides the system with the process information necessary to generate automation scenarios.
[0442] A "terminal" is a device or platform that receives input data from a user and transmits it to a server via a communication method.
[0443] A "server" is a central device that analyzes data received from terminals and creates and manages automation scenarios using generated AI models.
[0444] "Communication means" refers to protocols and interfaces used to transfer data from a terminal to a server, and possesses the function of sending and receiving data accurately and securely.
[0445] "Analysis means" refers to the processes and algorithms necessary to examine the received data in detail and identify user interface elements and operating procedures.
[0446] "Generation means" refers to the processes and functions used to construct automation scenarios using a generation AI model based on the analysis results.
[0447] A "generative AI model" is a program or system based on artificial intelligence technology that creates automation scenarios based on given prompts and information.
[0448] A "prompt" is a sentence in the form of an instruction or question that is input into a generating AI model, and it contains the information necessary to generate an automation scenario.
[0449] The system of this invention provides the user with the information necessary to automate a specific process, and generates a scenario based on that information. The system works in cooperation with the user, terminal, and server.
[0450] First, the user operates a terminal to supply the system with information related to the process they want to automate. This information mainly consists of screenshots, HTML configuration information, and manual operation instructions. This information is important for identifying the detailed process flow.
[0451] Next, the terminal receives information from the user, converts the data format, and then securely transmits it to the server. The data, now in the appropriate format, is used for analysis on the server.
[0452] The server receives the data sent from the terminal and begins analysis.
[0453] We use image analysis algorithms and natural language processing techniques to identify elements and operating procedures of the screen interface. Specifically, we use image processing libraries to extract UI elements from screenshots and a text analysis engine to extract operating procedures from linguistic data.
[0454] Next, the server generates an appropriate automation scenario using a specific generative AI model. This AI model interprets the information provided by the user as prompts and outputs the optimal scenario for the automation tool. Possible prompts include, for example, "Generate a scenario to automate the following process. Please refer to the screenshots and HTML information for instructions and complete the process."
[0455] The generated scenarios are evaluated for suitability and modified on the system if necessary. Finally, they are presented to the user via a terminal, allowing the user to review the content and make changes as needed. This entire process enables users to create automation scenarios tailored to their needs and improve work efficiency without requiring specialized programming knowledge.
[0456] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0457] Step 1:
[0458] The user enters screenshots of the process they want to automate, HTML configuration information, and specific operating instructions. This input data serves as foundational information to communicate the specific automation needs to the system. The screenshots show the UI layout, the HTML information identifies the page structure, and the operating instructions text represents the overall flow of the process.
[0459] Step 2:
[0460] The terminal processes various types of data collected from the user. Specifically, image data is converted to the appropriate resolution and format, and text data is encoded and converted into a format compatible with the server. The converted data is then sent to the server via the terminal's communication functions. The system is designed to maintain data integrity and security during this process.
[0461] Step 3:
[0462] The server analyzes screenshots and HTML information received from the terminal using an image analysis engine and an HTML analysis engine. Here, UI elements are extracted through image analysis, and the logical structure of the page is identified through HTML analysis. The output is a list indicating which elements are targeted by user interaction.
[0463] Step 4:
[0464] Next, the server extracts the operation steps from the text data and uses a natural language processing engine to identify each step. The extracted steps are then compared with a list of UI elements to determine the corresponding operation for each action. This is then converted into a prompt and used as input for the generative AI model.
[0465] Step 5:
[0466] The server inputs the constructed prompt statements into the AI model to generate an automation scenario. The AI model outputs the optimal scenario based on the input information. At this time, the suitability of the generated scenario is evaluated, taking into account all the functional reference information of the specific automation tool. If a mismatch is found, the scenario is modified and evaluated again.
[0467] Step 6:
[0468] The generated automation scenario is presented to the user via the terminal. The terminal provides a visual interface that shows the flow of the scenario, allowing the user to review its contents. The user accepts the scenario or requests modifications as needed, and the optimal automation scenario ultimately becomes executable.
[0469] (Application Example 1)
[0470] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0471] In data management and operations, there is a need for operational support in specific environments and for the automation of anomaly detection and response actions. However, existing systems make it difficult to efficiently manage and rapidly automate these processes. Furthermore, operators often require programming skills, and without specialized knowledge, achieving complex operations is challenging.
[0472] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0473] In this invention, the server includes a device that holds all functional reference information for a specific information processing device, input means for receiving image and text data as input, generation means for generating automation scenarios based on the received data, output means for outputting the generated scenarios, support means for providing operational support in a specific environment, and management means for detecting abnormalities in operational procedures and automating corresponding actions. This enables the automation of complex data management and operational processes, and allows for efficient operation without requiring operators to have programming skills.
[0474] An "information processing device" is a device that performs calculations and analyses using various types of data and executes programs to carry out specific functions.
[0475] "Input means" refers to devices or methods for acquiring information from users or external devices and processing it internally.
[0476] "Generation means" refers to methods or devices for creating new data or scenarios based on input information.
[0477] "Output means" refers to methods or devices for sending data generated within a system to an external source.
[0478] "Support measures" refer to methods or devices used to support and efficiently carry out specific tasks or processes.
[0479] "Management measures" refer to methods and devices for monitoring operational status, detecting abnormalities and problems, and taking appropriate action.
[0480] A system for implementing this invention includes an information processing device, an input device, a generation means, an output means, a support means, and a management means. The program of this system operates to effectively automate and support the operation of a data center.
[0481] The server receives image and text data provided by the user via an input device through a terminal. The input device converts this data into an analyzable format and transfers it to the server. The server uses an information processing device to utilize image and text analysis engines, and generates operational scenarios from various sensor information and system logs using a generation device.
[0482] Furthermore, the generated operational scenarios are presented to the user through an output device. The user can review the presented scenarios and make adjustments as needed. In this process, a generative AI model is applied and prompt statements are used to generate efficient scenarios. This generative AI model is particularly used for anomaly detection and optimization of intranet system compatibility.
[0483] As a concrete example, consider monitoring a cooling system within a data center and generating automated response scenarios. When a user inputs screenshots of the cooling system's dashboard and operation logs into the system, the server analyzes this data, detects anomalies, and automatically generates specific actions for the response process. This enables immediate system response and efficient operational management.
[0484] An example of a prompt message is: "Generate a scenario to detect an anomaly in the data center's cooling system and automatically respond to it. Based on the screenshots and instructions, define the corrective actions and make them executable."
[0485] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0486] Step 1:
[0487] The user inputs screenshots of the data center's cooling system status taken with their smartphone, along with related text data (system logs and procedure manuals). The input data is formatted on the device and prepared for transmission to the server.
[0488] Step 2:
[0489] The terminal sends the formatted data to the server. The server receives it and uses an image data analysis engine to extract important elements contained in the screenshot. For example, the set temperature and operating status of the cooling system may be identified.
[0490] Step 3:
[0491] The text data received by the server is analyzed by a text analysis engine. The analysis identifies signs of anomalies and necessary operational procedures from the system logs. Based on this information, a generative AI model creates appropriate countermeasure scenarios.
[0492] Step 4:
[0493] The server uses a generative AI model to generate response scenarios based on specific prompt messages. These generated scenarios include specific actions necessary to immediately address cooling system anomalies. This generation process leverages machine learning techniques to define the optimal response strategy.
[0494] Step 5:
[0495] The server outputs the generated scenario and presents it to the user via a terminal. If the user reviews the scenario and determines it is feasible, they apply its contents to the data center's control system. This process efficiently manages any abnormalities in the cooling system.
[0496] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0497] This invention provides an automated system that recognizes user emotions and reflects that information in the automated process. The system functions through the collaborative efforts of a server, a terminal, and the user.
[0498] First, the user inputs information about the task they want to automate using the device. This includes data such as screenshots, HTML configuration information, and specific operating procedures. The device also incorporates an emotion engine to capture the user's emotional state, recognizing their emotions in real time.
[0499] The terminal functions as an interface for sending information received from the user to the server. Here, the terminal transmits the input data to the server in a properly formatted form, including emotional information obtained from the emotion engine.
[0500] The server analyzes the transmitted data and recognizes elements of the screen interface from the image data. Furthermore, it analyzes the text data to identify the steps of the automation process. During the analysis, the server considers the user's emotional information provided by the emotion engine and generates automation scenarios using a generative model. The system is designed so that the user's emotions influence the selection and flow of the scenarios.
[0501] For example, if a user exhibits an unpleasant emotion during a task, the system can select a specific scenario flow to mitigate that emotion. Conversely, if a positive emotion is detected, the system can adjust the process to select steps that allow for more efficient progress.
[0502] Finally, the server sends the generated automation scenario back to the terminal. The user can review this scenario, make any necessary modifications, and then execute it. This ensures that the automation process unfolds in a way that takes the user's feelings into consideration, resulting in a more comfortable user experience.
[0503] The following describes the processing flow.
[0504] Step 1:
[0505] The user uses the device to input screenshots, HTML configuration information, and specific work steps for the tasks they wish to automate. Additionally, the device's built-in emotion engine performs facial recognition and voice analysis to detect emotions in real time.
[0506] Step 2:
[0507] The terminal consolidates all user input into a single data package. This package includes screenshot data, HTML information, work procedure text, and sentiment information obtained from the sentiment engine.
[0508] Step 3:
[0509] The terminal prepares to send the created data package to the server. It verifies that the data format is correct and sends the request to the server according to the transmission protocol.
[0510] Step 4:
[0511] The server analyzes the data received from the terminal and uses image recognition technology to extract screen interface elements from the screenshot. In this process, buttons, input fields, and menus are identified.
[0512] Step 5:
[0513] The server performs text analysis to identify the specific operations required for automation from the provided work procedures. It also understands the user's intent based on the analyzed data.
[0514] Step 6:
[0515] The server aggregates input from the emotion engine to identify the user's emotional state. This takes into account the user's stress level, concentration level, and emotional tone.
[0516] Step 7:
[0517] The server generates automation scenarios by applying a generative model using a functional reference of a specific automation tool and user sentiment information. During this process, it adjusts specific scenario flows and actions based on the user's sentiment.
[0518] Step 8:
[0519] The server verifies the integrity of the generated automation scenarios and automatically makes corrections if necessary. In particular, it ensures that the scenarios are preserved in a way that is appropriate to the user's emotions.
[0520] Step 9:
[0521] The server sends the generated automation scenario back to the terminal. The user reviews this scenario on the terminal, evaluates and modifies the actions, and then executes them. This enables an automation process that reflects the user's emotional state.
[0522] (Example 2)
[0523] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0524] Traditional automation systems often suffer from decreased user satisfaction and efficiency because they proceed with processes without considering the emotional state of the user. Furthermore, they frequently adopt a uniform approach, failing to adequately account for variations in automation requirements among different users.
[0525] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0526] In this invention, the server includes emotion analysis means for recognizing the user's emotional state and reflecting that information in processing, means for adjusting the operation flow based on the emotional state, and information recording means for holding all functional reference information of a specific information processing tool. This makes it possible to provide a flexible automated process that takes the user's emotions into consideration.
[0527] An "emotion analysis device" is a technological device that recognizes the emotional state of a user and incorporates that information into its processing.
[0528] A "means for adjusting the operation flow" refers to a technical device for modifying and optimizing the progress of a process based on the user's emotional state.
[0529] An "information recording means" is a technical device that holds and manages all functional reference information for a specific information processing tool, making it accessible as needed.
[0530] A "generation means" is a technical device for creating an automated process based on received video and text data.
[0531] "Output means" refers to a technological device that presents or provides the generated automated process to an external party.
[0532] "Analysis means" refers to a technical device that analyzes screen display elements contained in input video data and extracts information necessary for generating an automated process.
[0533] A "correction tool" is a technical device that corrects unsuitable parts of an automated process based on the analysis results and improves it into an appropriate process.
[0534] This invention is a system that generates and executes automated processes while taking user emotions into consideration. The system functions through the cooperation of a server, terminals, and users.
[0535] The user first uses the terminal to input information about the task they want to automate. This input includes screenshots, HTML structure information, and specific operating procedures. Furthermore, the terminal is equipped with emotion analysis software to recognize the user's emotions in real time. This allows for the analysis of the user's facial expressions and voice.
[0536] The terminal functions as an interface for sending information received from the user to the server. The terminal appropriately formats the input data and sends it to the server along with sentiment information obtained from the sentiment analysis engine. Standard communication protocols are used for this transmission.
[0537] The server analyzes the received data and generates automated processes using a generative AI model. It recognizes screen interface elements from image data and identifies automation steps from text data. Machine learning algorithms and image recognition technologies are used for these analyses. Furthermore, the generated scenarios are adjusted so that the user's emotional state is reflected in the process selection and progression. For example, if the user is stressed, the system will suggest intuitive and easy-to-understand steps.
[0538] The generated automation scenario is sent back to the user via their device. The user can review the scenario, make any necessary modifications, and then execute it. This enables a comfortable and efficient automation experience for the user.
[0539] For example, if you want to automate the photo editing process, the user inputs the editing steps. For instance, a prompt such as, "Please suggest steps that will allow the user to relax while automating the photo editing process," could be used. This prompt is input into a generative AI model, which then generates an automated procedure that suits the user's needs.
[0540] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0541] Step 1:
[0542] The user inputs information about the task they want to automate using a terminal. This input data includes screenshots, HTML structure information, and detailed operating instructions. This prepares the basic data for the user's desired output. The terminal also has an emotion analysis engine that acquires the user's emotional state in real time from their facial expressions and voice.
[0543] Step 2:
[0544] The terminal organizes the information received from the user and sends the data to the server. User sentiment information is also sent along with the data. Data formatting is performed to ensure the input data is in the correct format. This prepares the server for analysis.
[0545] Step 3:
[0546] The server receives data transmitted from the terminal. For image data, image analysis techniques are used to recognize elements of the screen interface. For text data, natural language processing is used to identify the operation procedures required by the automation engine. Through these analyses, the user's input data is prepared as detailed information for the next processing stage.
[0547] Step 4:
[0548] The server generates automated processes using a generative AI model. Based on the analyzed data and user sentiment information, it generates the most suitable automation scenario. For example, if the user's sentiment information is negative, the system will suggest steps to reduce the user's burden. This ensures that the scenario reflects a flow appropriate to the user's state.
[0549] Step 5:
[0550] The server sends the generated automation scenario to the terminal. The terminal is then ready to proceed to the next step by displaying this scenario to the user.
[0551] Step 6:
[0552] The user reviews the automation scenario displayed on the terminal and makes modifications as needed. The final automation process is confirmed by accepting the user's input and modifications. The user can then execute the confirmed process, thereby achieving the desired results.
[0553] (Application Example 2)
[0554] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0555] In the field of online shopping, presenting products without considering user emotions can damage the user experience and reduce their desire to purchase. Furthermore, the inability to flexibly respond to the diverse emotions and needs of users makes it difficult to improve user satisfaction, which can ultimately lead to lost sales opportunities.
[0556] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0557] In this invention, the server includes a device for recognizing the user's mental state and reflecting it in an automated process, and an output device for providing specific product information that is generated and includes a generative model for optimizing product presentation based on the user's emotions. This enables optimal product suggestions based on the user's emotions and improves the user experience.
[0558] "User's mental state" refers to the user's emotions and psychological state, and is the information necessary to adjust processes and product presentations based on that state.
[0559] A "device for reflecting information in an automated process" is a device that has the necessary functions to recognize the user's mental state and reflect it in the process.
[0560] A "generative model" is an AI-powered model that optimizes product presentations based on user emotions and generates specific product information.
[0561] An "output device" is a device necessary to present specific product information that has been generated to the customer.
[0562] An "analysis device" is a device that analyzes input data, extracts user emotions, and uses that information to help present products.
[0563] A "control device" is a device that has the function of suggesting alternative products to match the generated product information to the user's purchasing behavior, thereby improving user satisfaction.
[0564] This invention provides a system that recognizes the user's mental state and optimizes product presentation during online shopping.
[0565] First, users access the e-commerce platform through an application installed on their smartphone or smart glasses. The user's device incorporates an emotion recognition engine that analyzes the user's facial expressions and voice in real time to determine their emotional state. This emotion data is then transmitted to a server via the internet.
[0566] The server uses a generative AI model based on the received emotional data to present the most suitable products to the user. This model performs product recommendations that correspond to the user's emotions and generates specific product information. In this process, the server uses programming languages such as Python and Node.js to process the data.
[0567] The generated product information is sent to the user's device, allowing the user to view product details and make a purchase decision as needed. In particular, if the user expresses negative emotions, the server will provide relevant alternative products or supplementary information to improve the user experience.
[0568] For example, if a user expresses frustration while browsing a specific product page, the system automatically provides reviews and FAQs for that product to resolve their questions. If the user expresses positive emotions, the system suggests bundled purchase options for similar products to encourage further purchases.
[0569] An example of a prompt message to achieve this is: "The user expressed emotions (A, C, G) while viewing a product page. Since the emotion is primarily (A), generate a program and description that suggests related products."
[0570] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0571] Step 1:
[0572] The user launches a shopping application via their smartphone or smart glasses. The user's device has an emotion recognition engine installed, which uses the camera and microphone to collect emotional data from the user's facial expressions and voice. Based on this sensor input, the device analyzes the user's mental state in real time and outputs the emotions in digital format.
[0573] Step 2:
[0574] The device transmits collected emotional data and product information viewed by the user to the server via the internet. Input includes data about the user's mental state and identifying information such as product IDs. The device formats this data appropriately and transmits it in a format easily processed by the server.
[0575] Step 3:
[0576] The server analyzes the received sentiment data and product information, and uses a generative AI model to suggest the most suitable products to the user. Based on the sentiment data input, the generative AI model executes a recommendation algorithm and generates a list of highly relevant products. Product information is then generated to be presented to the user as output.
[0577] Step 4:
[0578] The server sends the generated product information back to the terminal. In this step, the product information is packaged in a format that is immediately usable by the user and sent quickly.
[0579] Step 5:
[0580] The device displays the received product information on the application screen. Through the screen, the user can view detailed product information and related reviews to make a purchase decision. The application then analyzes the user's response again using an emotion recognition engine and, if necessary, provides further product suggestions or information.
[0581] This series of processes enables a personalized shopping experience based on the user's emotions.
[0582] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0583] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0584] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0585] [Fourth Embodiment]
[0586] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0587] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0588] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0589] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0590] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0591] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0592] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0593] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0594] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0595] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0596] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0597] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0598] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0599] The system of this invention aims to allow users to easily input information about the processes they wish to automate and generate automation scenarios based on this input. This system functions through the collaboration of a server, a terminal, and the user.
[0600] First, the user uses a terminal to input screenshots of the system to be automated, HTML configuration information, and specific work procedures. This allows the user to provide detailed work information to the system as text and images, thereby consolidating the fundamental information needed for automation.
[0601] Next, the terminal functions as an interface for sending data received from the user to the server. The terminal performs data format conversion and transmission processing to ensure that the input data is reliably transmitted to the server.
[0602] The server performs information analysis on the received data. First, it analyzes the elements of the screen interface contained in the image data to understand the structure of the user interface and identify which parts are the target of user interaction. Furthermore, it identifies specific operation procedures from the text data and determines the action to take based on them.
[0603] Next, the server leverages the full functional reference information of the specific automation tool and applies a generative model to generate automation scenarios. This defines appropriate automation procedures on the system based on user input. The system then verifies whether the generated scenarios are compatible with the functionality of the specific automation tool and automatically makes any necessary corrections.
[0604] Finally, the generated automation scenario is presented to the user via a terminal. The user can review it, make modifications as needed, and then execute the final scenario. As a specific example, when automating a process such as order processing in accounting, the user inputs the operating procedures and form input flow of the order system into the system. Based on this information, the system automatically generates a scenario from displaying the form to confirming and completing the order, thereby improving operational efficiency.
[0605] Thus, the present invention aims to support end users in building automated processes, and its implementation enables the easy realization of complex automation without requiring programming skills.
[0606] The following describes the processing flow.
[0607] Step 1:
[0608] The user uses a terminal to input screenshots of the system operations to be automated, HTML configuration information, and specific work procedures. This provides the system with detailed data about the processes to be automated.
[0609] Step 2:
[0610] The terminal converts the input data received from the user into an appropriate format and prepares it for transmission to the server. In particular, the data is formatted to conform to the standards required by the system.
[0611] Step 3:
[0612] A request containing screenshots, HTML information, and data related to work procedures is sent from the terminal to the server. This transmission process ensures that the data is received without any loss or errors.
[0613] Step 4:
[0614] The server analyzes the received data and first uses image data to identify elements displayed on the screen. It then utilizes image recognition technology to identify UI components such as buttons and form fields.
[0615] Step 5:
[0616] The server analyzes the text data and extracts the specific operational steps required for the automation process. This procedural information becomes the foundational data used later to generate automation scenarios.
[0617] Step 6:
[0618] The server generates automation scenarios by applying a generative model while referencing functional reference data for specific automation tools. In this process, detected UI components and operation procedures are reflected in the automation scenarios.
[0619] Step 7:
[0620] The server verifies the suitability of the generated automation scenarios and confirms their compatibility with specific automation tools. If any inconsistencies are found, the scenario is automatically corrected.
[0621] Step 8:
[0622] The server returns the completed automation scenario to the terminal. This output data is used by the user to review and modify the scenario.
[0623] Step 9:
[0624] Users review the automation scenarios provided through their terminals, make adjustments as needed, and then actually execute the automated process. This enables the automation of business processes.
[0625] (Example 1)
[0626] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0627] Creating automation scenarios requires a great deal of expertise, making them difficult for the average user to use efficiently. Furthermore, ensuring the proper extraction of user interfaces and operating procedures, and their compatibility with automation tools, is not easy. Therefore, there is a need to generate automation scenarios easily and accurately to improve operational efficiency.
[0628] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0629] In this invention, the server includes a terminal for collecting process information from the user, communication means for transferring data to the server via the terminal, analysis means for analyzing the received data and identifying UI elements and operating procedures, and generation means for creating automation scenarios using an AI model generated from the analyzed data. This makes it possible for users to easily generate automation scenarios without specialized knowledge, improving operational efficiency while maintaining system compatibility.
[0630] A "user" is the entity that provides the system with the process information necessary to generate automation scenarios.
[0631] A "terminal" is a device or platform that receives input data from a user and transmits it to a server via a communication method.
[0632] A "server" is a central device that analyzes data received from terminals and creates and manages automation scenarios using generated AI models.
[0633] "Communication means" refers to protocols and interfaces used to transfer data from a terminal to a server, and possesses the function of sending and receiving data accurately and securely.
[0634] "Analysis means" refers to the processes and algorithms necessary to examine the received data in detail and identify user interface elements and operating procedures.
[0635] "Generation means" refers to the processes and functions used to construct automation scenarios using a generation AI model based on the analysis results.
[0636] A "generative AI model" is a program or system based on artificial intelligence technology that creates automation scenarios based on given prompts and information.
[0637] A "prompt" is a sentence in the form of an instruction or question that is input into a generating AI model, and it contains the information necessary to generate an automation scenario.
[0638] The system of this invention provides the user with the information necessary to automate a specific process, and generates a scenario based on that information. The system works in cooperation with the user, terminal, and server.
[0639] First, the user operates a terminal to supply the system with information related to the process they want to automate. This information mainly consists of screenshots, HTML configuration information, and manual operation instructions. This information is important for identifying the detailed process flow.
[0640] Next, the terminal receives information from the user, converts the data format, and then securely transmits it to the server. The data, now in the appropriate format, is used for analysis on the server.
[0641] The server receives the data sent from the terminal and begins analysis.
[0642] We use image analysis algorithms and natural language processing techniques to identify elements and operating procedures of the screen interface. Specifically, we use image processing libraries to extract UI elements from screenshots and a text analysis engine to extract operating procedures from linguistic data.
[0643] Next, the server generates an appropriate automation scenario using a specific generative AI model. This AI model interprets the information provided by the user as prompts and outputs the optimal scenario for the automation tool. Possible prompts include, for example, "Generate a scenario to automate the following process. Please refer to the screenshots and HTML information for instructions and complete the process."
[0644] The generated scenarios are evaluated for suitability and modified on the system if necessary. Finally, they are presented to the user via a terminal, allowing the user to review the content and make changes as needed. This entire process enables users to create automation scenarios tailored to their needs and improve work efficiency without requiring specialized programming knowledge.
[0645] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0646] Step 1:
[0647] The user enters screenshots of the process they want to automate, HTML configuration information, and specific operating instructions. This input data serves as foundational information to communicate the specific automation needs to the system. The screenshots show the UI layout, the HTML information identifies the page structure, and the operating instructions text represents the overall flow of the process.
[0648] Step 2:
[0649] The terminal processes various types of data collected from the user. Specifically, image data is converted to the appropriate resolution and format, and text data is encoded and converted into a format compatible with the server. The converted data is then sent to the server via the terminal's communication functions. The system is designed to maintain data integrity and security during this process.
[0650] Step 3:
[0651] The server analyzes screenshots and HTML information received from the terminal using an image analysis engine and an HTML analysis engine. Here, UI elements are extracted through image analysis, and the logical structure of the page is identified through HTML analysis. The output is a list indicating which elements are targeted by user interaction.
[0652] Step 4:
[0653] Next, the server extracts the operation steps from the text data and uses a natural language processing engine to identify each step. The extracted steps are then compared with a list of UI elements to determine the corresponding operation for each action. This is then converted into a prompt and used as input for the generative AI model.
[0654] Step 5:
[0655] The server inputs the constructed prompt statements into the AI model to generate an automation scenario. The AI model outputs the optimal scenario based on the input information. At this time, the suitability of the generated scenario is evaluated, taking into account all the functional reference information of the specific automation tool. If a mismatch is found, the scenario is modified and evaluated again.
[0656] Step 6:
[0657] The generated automation scenario is presented to the user via the terminal. The terminal provides a visual interface that shows the flow of the scenario, allowing the user to review its contents. The user accepts the scenario or requests modifications as needed, and the optimal automation scenario ultimately becomes executable.
[0658] (Application Example 1)
[0659] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0660] In data management and operations, there is a need for operational support in specific environments and for the automation of anomaly detection and response actions. However, existing systems make it difficult to efficiently manage and rapidly automate these processes. Furthermore, operators often require programming skills, and without specialized knowledge, achieving complex operations is challenging.
[0661] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0662] In this invention, the server includes a device that holds all functional reference information for a specific information processing device, input means for receiving image and text data as input, generation means for generating automation scenarios based on the received data, output means for outputting the generated scenarios, support means for providing operational support in a specific environment, and management means for detecting abnormalities in operational procedures and automating corresponding actions. This enables the automation of complex data management and operational processes, and allows for efficient operation without requiring operators to have programming skills.
[0663] An "information processing device" is a device that performs calculations and analyses using various types of data and executes programs to carry out specific functions.
[0664] "Input means" refers to devices or methods for acquiring information from users or external devices and processing it internally.
[0665] "Generation means" refers to methods or devices for creating new data or scenarios based on input information.
[0666] "Output means" refers to methods or devices for sending data generated within a system to an external source.
[0667] "Support measures" refer to methods or devices used to support and efficiently carry out specific tasks or processes.
[0668] "Management measures" refer to methods and devices for monitoring operational status, detecting abnormalities and problems, and taking appropriate action.
[0669] A system for implementing this invention includes an information processing device, an input device, a generation means, an output means, a support means, and a management means. The program of this system operates to effectively automate and support the operation of a data center.
[0670] The server receives image and text data provided by the user via an input device through a terminal. The input device converts this data into an analyzable format and transfers it to the server. The server uses an information processing device to utilize image and text analysis engines, and generates operational scenarios from various sensor information and system logs using a generation device.
[0671] Furthermore, the generated operational scenarios are presented to the user through an output device. The user can review the presented scenarios and make adjustments as needed. In this process, a generative AI model is applied and prompt statements are used to generate efficient scenarios. This generative AI model is particularly used for anomaly detection and optimization of intranet system compatibility.
[0672] As a concrete example, consider monitoring a cooling system within a data center and generating automated response scenarios. When a user inputs screenshots of the cooling system's dashboard and operation logs into the system, the server analyzes this data, detects anomalies, and automatically generates specific actions for the response process. This enables immediate system response and efficient operational management.
[0673] An example of a prompt message is: "Generate a scenario to detect an anomaly in the data center's cooling system and automatically respond to it. Based on the screenshots and instructions, define the corrective actions and make them executable."
[0674] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0675] Step 1:
[0676] The user inputs screenshots of the data center's cooling system status taken with their smartphone, along with related text data (system logs and procedure manuals). The input data is formatted on the device and prepared for transmission to the server.
[0677] Step 2:
[0678] The terminal sends the formatted data to the server. The server receives it and uses an image data analysis engine to extract important elements contained in the screenshot. For example, the set temperature and operating status of the cooling system may be identified.
[0679] Step 3:
[0680] The text data received by the server is analyzed by a text analysis engine. The analysis identifies signs of anomalies and necessary operational procedures from the system logs. Based on this information, a generative AI model creates appropriate countermeasure scenarios.
[0681] Step 4:
[0682] The server uses a generative AI model to generate response scenarios based on specific prompt messages. These generated scenarios include specific actions necessary to immediately address cooling system anomalies. This generation process leverages machine learning techniques to define the optimal response strategy.
[0683] Step 5:
[0684] The server outputs the generated scenario and presents it to the user via a terminal. If the user reviews the scenario and determines it is feasible, they apply its contents to the data center's control system. This process efficiently manages any abnormalities in the cooling system.
[0685] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0686] This invention provides an automated system that recognizes user emotions and reflects that information in the automated process. The system functions through the collaborative efforts of a server, a terminal, and the user.
[0687] First, the user inputs information about the task they want to automate using the device. This includes data such as screenshots, HTML configuration information, and specific operating procedures. The device also incorporates an emotion engine to capture the user's emotional state, recognizing their emotions in real time.
[0688] The terminal functions as an interface for sending information received from the user to the server. Here, the terminal transmits the input data to the server in a properly formatted form, including emotional information obtained from the emotion engine.
[0689] The server analyzes the transmitted data and recognizes elements of the screen interface from the image data. Furthermore, it analyzes the text data to identify the steps of the automation process. During the analysis, the server considers the user's emotional information provided by the emotion engine and generates automation scenarios using a generative model. The system is designed so that the user's emotions influence the selection and flow of the scenarios.
[0690] For example, if a user exhibits an unpleasant emotion during a task, the system can select a specific scenario flow to mitigate that emotion. Conversely, if a positive emotion is detected, the system can adjust the process to select steps that allow for more efficient progress.
[0691] Finally, the server sends the generated automation scenario back to the terminal. The user can review this scenario, make any necessary modifications, and then execute it. This ensures that the automation process unfolds in a way that takes the user's feelings into consideration, resulting in a more comfortable user experience.
[0692] The following describes the processing flow.
[0693] Step 1:
[0694] The user uses the device to input screenshots, HTML configuration information, and specific work steps for the tasks they wish to automate. Additionally, the device's built-in emotion engine performs facial recognition and voice analysis to detect emotions in real time.
[0695] Step 2:
[0696] The terminal consolidates all user input into a single data package. This package includes screenshot data, HTML information, work procedure text, and sentiment information obtained from the sentiment engine.
[0697] Step 3:
[0698] The terminal prepares to send the created data package to the server. It verifies that the data format is correct and sends the request to the server according to the transmission protocol.
[0699] Step 4:
[0700] The server analyzes the data received from the terminal and uses image recognition technology to extract screen interface elements from the screenshot. In this process, buttons, input fields, and menus are identified.
[0701] Step 5:
[0702] The server performs text analysis to identify the specific operations required for automation from the provided work procedures. It also understands the user's intent based on the analyzed data.
[0703] Step 6:
[0704] The server aggregates input from the emotion engine to identify the user's emotional state. This takes into account the user's stress level, concentration level, and emotional tone.
[0705] Step 7:
[0706] The server generates automation scenarios by applying a generative model using a functional reference of a specific automation tool and user sentiment information. During this process, it adjusts specific scenario flows and actions based on the user's sentiment.
[0707] Step 8:
[0708] The server verifies the integrity of the generated automation scenarios and automatically makes corrections if necessary. In particular, it ensures that the scenarios are preserved in a way that is appropriate to the user's emotions.
[0709] Step 9:
[0710] The server sends the generated automation scenario back to the terminal. The user reviews this scenario on the terminal, evaluates and modifies the actions, and then executes them. This enables an automation process that reflects the user's emotional state.
[0711] (Example 2)
[0712] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0713] Traditional automation systems often suffer from decreased user satisfaction and efficiency because they proceed with processes without considering the emotional state of the user. Furthermore, they frequently adopt a uniform approach, failing to adequately account for variations in automation requirements among different users.
[0714] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0715] In this invention, the server includes emotion analysis means for recognizing the user's emotional state and reflecting that information in processing, means for adjusting the operation flow based on the emotional state, and information recording means for holding all functional reference information of a specific information processing tool. This makes it possible to provide a flexible automated process that takes the user's emotions into consideration.
[0716] An "emotion analysis device" is a technological device that recognizes the emotional state of a user and incorporates that information into its processing.
[0717] A "means for adjusting the operation flow" refers to a technical device for modifying and optimizing the progress of a process based on the user's emotional state.
[0718] An "information recording means" is a technical device that holds and manages all functional reference information for a specific information processing tool, making it accessible as needed.
[0719] A "generation means" is a technical device for creating an automated process based on received video and text data.
[0720] "Output means" refers to a technological device that presents or provides the generated automated process to an external party.
[0721] "Analysis means" refers to a technical device that analyzes screen display elements contained in input video data and extracts information necessary for generating an automated process.
[0722] A "correction tool" is a technical device that corrects unsuitable parts of an automated process based on the analysis results and improves it into an appropriate process.
[0723] This invention is a system that generates and executes automated processes while taking user emotions into consideration. The system functions through the cooperation of a server, terminals, and users.
[0724] The user first uses the terminal to input information about the task they want to automate. This input includes screenshots, HTML structure information, and specific operating procedures. Furthermore, the terminal is equipped with emotion analysis software to recognize the user's emotions in real time. This allows for the analysis of the user's facial expressions and voice.
[0725] The terminal functions as an interface for sending information received from the user to the server. The terminal appropriately formats the input data and sends it to the server along with sentiment information obtained from the sentiment analysis engine. Standard communication protocols are used for this transmission.
[0726] The server analyzes the received data and generates automated processes using a generative AI model. It recognizes screen interface elements from image data and identifies automation steps from text data. Machine learning algorithms and image recognition technologies are used for these analyses. Furthermore, the generated scenarios are adjusted so that the user's emotional state is reflected in the process selection and progression. For example, if the user is stressed, the system will suggest intuitive and easy-to-understand steps.
[0727] The generated automation scenario is sent back to the user via their device. The user can review the scenario, make any necessary modifications, and then execute it. This enables a comfortable and efficient automation experience for the user.
[0728] For example, if you want to automate the photo editing process, the user inputs the editing steps. For instance, a prompt such as, "Please suggest steps that will allow the user to relax while automating the photo editing process," could be used. This prompt is input into a generative AI model, which then generates an automated procedure that suits the user's needs.
[0729] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0730] Step 1:
[0731] The user inputs information about the task they want to automate using a terminal. This input data includes screenshots, HTML structure information, and detailed operating instructions. This prepares the basic data for the user's desired output. The terminal also has an emotion analysis engine that acquires the user's emotional state in real time from their facial expressions and voice.
[0732] Step 2:
[0733] The terminal organizes the information received from the user and sends the data to the server. User sentiment information is also sent along with the data. Data formatting is performed to ensure the input data is in the correct format. This prepares the server for analysis.
[0734] Step 3:
[0735] The server receives data transmitted from the terminal. For image data, image analysis techniques are used to recognize elements of the screen interface. For text data, natural language processing is used to identify the operation procedures required by the automation engine. Through these analyses, the user's input data is prepared as detailed information for the next processing stage.
[0736] Step 4:
[0737] The server generates automated processes using a generative AI model. Based on the analyzed data and user sentiment information, it generates the most suitable automation scenario. For example, if the user's sentiment information is negative, the system will suggest steps to reduce the user's burden. This ensures that the scenario reflects a flow appropriate to the user's state.
[0738] Step 5:
[0739] The server sends the generated automation scenario to the terminal. The terminal is then ready to proceed to the next step by displaying this scenario to the user.
[0740] Step 6:
[0741] The user reviews the automation scenario displayed on the terminal and makes modifications as needed. The final automation process is confirmed by accepting the user's input and modifications. The user can then execute the confirmed process, thereby achieving the desired results.
[0742] (Application Example 2)
[0743] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0744] In the field of online shopping, presenting products without considering user emotions can damage the user experience and reduce their desire to purchase. Furthermore, the inability to flexibly respond to the diverse emotions and needs of users makes it difficult to improve user satisfaction, which can ultimately lead to lost sales opportunities.
[0745] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0746] In this invention, the server includes a device for recognizing the user's mental state and reflecting it in an automated process, and an output device for providing specific product information that is generated and includes a generative model for optimizing product presentation based on the user's emotions. This enables optimal product suggestions based on the user's emotions and improves the user experience.
[0747] "User's mental state" refers to the user's emotions and psychological state, and is the information necessary to adjust processes and product presentations based on that state.
[0748] A "device for reflecting information in an automated process" is a device that has the necessary functions to recognize the user's mental state and reflect it in the process.
[0749] A "generative model" is an AI-powered model that optimizes product presentations based on user emotions and generates specific product information.
[0750] An "output device" is a device necessary to present specific product information that has been generated to the customer.
[0751] An "analysis device" is a device that analyzes input data, extracts user emotions, and uses that information to help present products.
[0752] A "control device" is a device that has the function of suggesting alternative products to match the generated product information to the user's purchasing behavior, thereby improving user satisfaction.
[0753] This invention provides a system that recognizes the user's mental state and optimizes product presentation during online shopping.
[0754] First, users access the e-commerce platform through an application installed on their smartphone or smart glasses. The user's device incorporates an emotion recognition engine that analyzes the user's facial expressions and voice in real time to determine their emotional state. This emotion data is then transmitted to a server via the internet.
[0755] The server uses a generative AI model based on the received emotional data to present the most suitable products to the user. This model performs product recommendations that correspond to the user's emotions and generates specific product information. In this process, the server uses programming languages such as Python and Node.js to process the data.
[0756] The generated product information is sent to the user's device, allowing the user to view product details and make a purchase decision as needed. In particular, if the user expresses negative emotions, the server will provide relevant alternative products or supplementary information to improve the user experience.
[0757] For example, if a user expresses frustration while browsing a specific product page, the system automatically provides reviews and FAQs for that product to resolve their questions. If the user expresses positive emotions, the system suggests bundled purchase options for similar products to encourage further purchases.
[0758] An example of a prompt message to achieve this is: "The user expressed emotions (A, C, G) while viewing a product page. Since the emotion is primarily (A), generate a program and description that suggests related products."
[0759] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0760] Step 1:
[0761] The user launches a shopping application via their smartphone or smart glasses. The user's device has an emotion recognition engine installed, which uses the camera and microphone to collect emotional data from the user's facial expressions and voice. Based on this sensor input, the device analyzes the user's mental state in real time and outputs the emotions in digital format.
[0762] Step 2:
[0763] The device transmits collected emotional data and product information viewed by the user to the server via the internet. Input includes data about the user's mental state and identifying information such as product IDs. The device formats this data appropriately and transmits it in a format easily processed by the server.
[0764] Step 3:
[0765] The server analyzes the received sentiment data and product information, and uses a generative AI model to suggest the most suitable products to the user. Based on the sentiment data input, the generative AI model executes a recommendation algorithm and generates a list of highly relevant products. Product information is then generated to be presented to the user as output.
[0766] Step 4:
[0767] The server sends the generated product information back to the terminal. In this step, the product information is packaged in a format that is immediately usable by the user and sent quickly.
[0768] Step 5:
[0769] The device displays the received product information on the application screen. Through the screen, the user can view detailed product information and related reviews to make a purchase decision. The application then analyzes the user's response again using an emotion recognition engine and, if necessary, provides further product suggestions or information.
[0770] This series of processes enables a personalized shopping experience based on the user's emotions.
[0771] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0772] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0773] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0774] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0775] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0776] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0777] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0778] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0779] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0780] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0781] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0782] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0783] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0784] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0785] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0786] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0787] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0788] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0789] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0790] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0791] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0792] The following is further disclosed regarding the embodiments described above.
[0793] (Claim 1)
[0794] An information processing device that holds all functional reference information for a specific automation tool,
[0795] An input device for receiving image and text data as input,
[0796] It features a generative model that generates automation scenarios based on received image and text data,
[0797] A system including an output device that outputs the generated automation scenarios.
[0798] (Claim 2)
[0799] The elements of the screen interface contained in the input image data are analyzed.
[0800] The above generation model is equipped with an analysis device for extracting the information necessary to generate automation scenarios.
[0801] The system according to claim 1.
[0802] (Claim 3)
[0803] Based on the analysis results, the suitability of the automation scenario is compared with the functional reference information of a specific automation tool.
[0804] Equipped with a control device to correct non-conforming parts,
[0805] The system according to claim 1.
[0806] "Example 1"
[0807] (Claim 1)
[0808] Equipped with a terminal to collect process information from users,
[0809] A communication method for transferring data to a server via a terminal,
[0810] An analysis means that analyzes the received data and identifies UI elements and operating procedures,
[0811] A generation method for creating automation scenarios using an AI model generated from analyzed data,
[0812] Verification means to check and correct the suitability of the generation scenario,
[0813] A system that includes means for presenting generated automation scenarios to the user.
[0814] (Claim 2)
[0815] The system according to claim 1, comprising means for optimizing a scenario by utilizing functional reference information of an automation tool based on input process information.
[0816] (Claim 3)
[0817] The system according to claim 1, comprising means for constructing a prompt sentence using the analyzed data and inputting it into a generation AI model.
[0818] "Application Example 1"
[0819] (Claim 1)
[0820] A device that holds all functional reference information for a specific information processing device,
[0821] An input means for receiving image and text data as input,
[0822] A generation means for generating an automated scenario based on received image and text data,
[0823] An output means for outputting the generated automation scenario,
[0824] Support tools for providing operational support in a specific environment,
[0825] A management system for detecting abnormalities in operational procedures and automating corresponding actions,
[0826] A system that includes this.
[0827] (Claim 2)
[0828] The system according to claim 1, further comprising an analysis means for analyzing the components contained in the input image data and for extracting information necessary for the generation means to generate an automated scenario.
[0829] (Claim 3)
[0830] The system according to claim 1, further comprising control means for comparing the suitability of an automation scenario with functional reference information of a specific information processing device based on the analysis results, and correcting any non-suitable parts.
[0831] "Example 2 of combining an emotion engine"
[0832] (Claim 1)
[0833] An emotion analysis method that recognizes the user's emotional state and reflects that information in processing,
[0834] Means for adjusting the action flow based on the aforementioned emotional state,
[0835] Information recording means for holding reference information for all functions of a specific information processing tool,
[0836] A receiving means for receiving video and text data as input,
[0837] A generation means that generates an automated process based on the received video and text data,
[0838] An output means for outputting the generated automation process,
[0839] A system that includes this.
[0840] (Claim 2)
[0841] The screen display elements contained in the input video data are analyzed,
[0842] The system according to claim 1, further comprising an analysis means for extracting information necessary for generating an automated process.
[0843] (Claim 3)
[0844] Based on the analysis results, the suitability of the automation process is compared with the functional reference information of a specific information processing tool.
[0845] The system according to claim 1, further comprising a correction means for correcting non-conforming parts.
[0846] "Application example 2 when combining with an emotional engine"
[0847] (Claim 1)
[0848] A device for recognizing the user's mental state and reflecting it in an automated process,
[0849] It features a generative model for optimizing product presentation based on user emotions,
[0850] A system including an output device for providing generated specific product information.
[0851] (Claim 2)
[0852] It is equipped with an analysis device that analyzes the user's input facial expression data, extracts emotions, and uses them to help present products.
[0853] The system according to claim 1.
[0854] (Claim 3)
[0855] Equipped with a control device that proposes alternative products in a timely manner to match the generated product information to the user's purchasing behavior, thereby improving user satisfaction.
[0856] The system according to claim 1. [Explanation of Symbols]
[0857] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A device that holds all functional reference information for a specific information processing device, An input means for receiving image and text data as input, A generation means for generating an automated scenario based on received image and text data, An output means for outputting the generated automation scenario, Support tools for providing operational support in a specific environment, A management system for detecting abnormalities in operational procedures and automating corresponding actions, A system that includes this.
2. The system according to claim 1, further comprising an analysis means for analyzing the components contained in the input image data and for extracting information necessary for the generation means to generate an automated scenario.
3. The system according to claim 1, further comprising control means for comparing the suitability of an automation scenario with functional reference information of a specific information processing device based on the analysis results, and correcting any non-suitable parts.