System and method for automated learning and task execution by digital workers through video-based shadowing
The system addresses the limitations of existing automation systems by using video-based shadowing and machine learning to generate contextually adapted instructions, allowing non-technical users to create reliable and secure automation workflows.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- FASTAUTOMATE INC
- Filing Date
- 2025-12-19
- Publication Date
- 2026-06-25
AI Technical Summary
Existing automation systems lack adaptive learning capabilities, struggle with cross-environment compatibility, and fail to create reliable automation solutions for non-technical users due to the absence of video-based shadowing frameworks and comprehensive machine learning modules, leading to inefficiencies and security concerns.
A system and method for automated learning and task execution by digital workers through video-based shadowing, utilizing a machine learning module to process video recordings, detect user interface elements, and generate contextually adapted instructions, enabling autonomous task execution.
Enables non-technical users to create automation workflows by recording video demonstrations, generating executable instructions through machine learning analysis, ensuring reliable and secure task execution across varying environments.
Smart Images

Figure IB2025063243_25062026_PF_FP_ABST
Abstract
Description
SYSTEM AND METHOD FOR AUTOMATED LEARNING AND TASKEXECUTION BY DIGITAL WORKERS THROUGH VIDEO-BASED SHADOWINGCROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to a U.S. Provisional Application No. 63 / 736,623, titled "SYSTEMS AND METHODS OF SHADOWING AND TRAINING MECHANISMS FOR DIGITAL WORKERS", filed on December 20th, 2024, which is hereby incorporated by reference in its entirety.FIELD OF INVENTION
[0002] The present disclosure relates to automated learning techniques for digital workers and agents, and more particularly to systems and methods for training digital workers to autonomously execute tasks through video-based job shadowing and machine learning processing of user demonstrations.BACKGROUND OF THE INVENTION
[0003] Traditional automation systems and robotic process automation tools rely on rigid programming interfaces and predefined templates that provide limited adaptability to changing workflow requirements, dynamic user interface elements, and evolving organizational processes. These conventional systems typically operate through predetermined scripts and fixed interaction patterns that lack the capability to learn from user demonstrations, develop contextual understanding, or adapt to varying execution environments through machine learning processes. The absence of video-based shadowing frameworks in existing systems constrains their ability to provide both intuitive task recording and intelligent instruction generation simultaneously.
[0004] Current task automation platforms often struggle to create accessible automation solutions for non-technical users while maintaining reliability and cross-environment compatibility. Many existing systems focus primarily on document object model parsing and screenshot-based element matching, without incorporating sophisticated machine learning modules that can process video recordings including frame-by-frame analysis, user input event monitoring, and computer vision techniques alongside contextual inputs comprising user interface element detection and workflow pattern recognition. This approach results in brittle automation scripts that fail to execute reliably across different environments with varying user interface elements through adaptive instruction processing.
[0005] Robotic process automation infrastructures frequently operate through complex flowchart builders where task recording, instruction generation, scheduling systems, and execution monitoring function independently without comprehensive integration layers. Thi s fragmentation leads to inefficiencies in workflow automation, missed opportunities for coordinated task management across multiple organizational domains, and inability to provide unified task execution through intelligent digital workers. The lack of processed instructions files adapted to specific task contexts for maintaining reliable records of task steps and execution parameters further limits flexibility and usability in automation systems.
[0006] Existing artificial intelligence applications in task automation typically focus on narrow use cases such as basic script generation or simple command interpretation that lack contextual understanding capabilities and cross-environment adaptability. These implementations cannot develop meaningful instructions through video analysis processing, maintain consistent execution across multiple interface variations including web applications and desktop environments, or coordinate between task recorder modules capturing user demonstrations and machine learning modules generating processed instructions files. The absence of comprehensive video-based shadowing frameworks that can transform user demonstrations into executable instructions with developing contextual awareness and adaptive learning capabilities represents a significant gap in current technology offerings.
[0007] Large language model applications in task automation present security concerns despite the potential for creating natural language-based automation interfaces. Current systems utilizing large language models risk exposing user data to model training processes and produce unpredictable outputs that cannot be reliably validated before execution. The combination of security vulnerabilities with output instability and deployment restrictions presents challenges for creating trustworthy automation systems that enable user verification of generated scripts and execution compliance without compromising data security.
[0008] The growing demand for accessible automation solutions and intuitive task recording interfaces has created new possibilities for enabling non-technical users to automate complex workflows through demonstration-based learning. However, current implementations lack sophisticated video-based shadowing frameworks that could enable digital workers to function as autonomous task executors with contextual understanding, cross-environment adaptability, and coordinated scheduling oversight. The absence of machine learning modules capable of processing multimodal data streams including video frames, mouse activity, voice commands, and keyboard inputs limits the potential for creating comprehensive automation systems thattranscend traditional programming interfaces and establish digital workers as intelligent assistants capable of learning from human demonstrations.
[0009] Therefore, there exists a need for improved systems and methods for automated learning and task execution by digital workers that can capture video demonstrations of task steps, process recordings through machine learning to generate contextually adapted instructions, and enable autonomous task execution through video-based job shadowing frameworks.SUMMARY OF THE INVENTION
[0010] It is an object of the present invention to provide a system for automated learning for a digital worker comprising one or more processors and a non -transitory machine-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to receive a video recording via a network interface device, wherein the video recording is captured by a task recorder executing on a client device and comprises screen capture data demonstrating steps for executing a specific task performed by a user.
[0011] It is another object of the present invention to provide a system that generates a raw instructions file comprising recorded steps, user input events captured during the video recording, coordinate parameters specifying screen positions of user interactions, and timing information capturing intervals between consecutive actions.
[0012] It is another object of the present invention to provide a machine learning module that processes the video recording and the raw instructions file to generate a processed instructions file adapted to a context of the specific task, wherein processing comprises analyzing individual frames of the video recording using computer vision techniques to detect and classify user interface elements, applying optical character recognition to extract text content from the detected user interface elements, generating bounding box coordinates defining rectangular regions encompassing each detected user interface element, determining click type parameters indicating types of mouse interactions performed at each interaction step, and correlating the detected user interface elements with the user input events based on temporal alignment between video frames and captured interaction timestamps.
[0013] It is another object of the present invention to provide validation of the processed instructions file by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file to verify accuracy of the detected user interface elements.
[0014] It is another object of the present invention to provide a digital worker that is a software-based agent configured to execute automated tasks by interacting with targetapplications according to the processed instructions file, wherein the digital worker executes the specific task by reproducing the user interactions on the target applications using the bounding box coordinates and click type parameters from the processed instructions file.
[0015] In order to overcome the limitations stated herein, the present invention provides a system and a method for automated learning and task execution by digital workers through video-based shadowing. The system includes one or more processors and a non-transitory machine-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations. The operations comprise receiving, via a network interface device, a video recording captured by a task recorder executing on a client device, wherein the video recording comprises screen capture data demonstrating steps for executing a specific task performed by a user. The operations further comprise generating a raw instructions file comprising recorded steps, user input events captured during the video recording, coordinate parameters specifying screen positions of user interactions, and timing information capturing intervals between consecutive actions.
[0016] The operations further comprise processing, by a machine learning module executing on the one or more processors, the video recording and the raw instructions file to generate a processed instructions file adapted to a context of the specific task. Processing comprises analyzing individual frames of the video recording using computer vision techniques to detect and classify user interface elements, applying optical character recognition to extract text content from the detected user interface elements, generating bounding box coordinates defining rectangular regions encompassing each detected user interface element, determining click type parameters indicating types of mouse interactions performed at each interaction step, and correlating the detected user interface elements with the user input events based on temporal alignment between video frames and captured interaction timestamps.
[0017] The operations further comprise validating the processed instructions file by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file to verify accuracy of the detected user interface elements. The operations further comprise transmitting, via the network interface device, the processed instructions file to a digital worker, wherein the digital worker is a software-based agent configured to execute automated tasks by interacting with target applications according to the processed instructions file. The operations further comprise executing, by the digital worker, the specific task by reproducing the user interactions on the target applications using the bounding box coordinates and click type parameters from the processed instructions file.
[0018] In one aspect, the user input events comprise mouse activity on a screen including cursor position changes and click events, voice commands captured via audio input, natural language instructions, screen state changes detected through frame comparison, and keyboard inputs including key press sequences. The user input events may be captured concurrently with the video recording to enable temporal correlation between visual content and user interactions for generating the raw instructions file. The mouse activity may comprise cursor position changes, click events including left clicks and right clicks with associated screen coordinates, and drag-and-drop operations with start and end position coordinates, wherein the associated screen coordinates are used by the machine learning module to correlate mouse interactions with detected user interface elements during processing of the video recording.
[0019] In another aspect, the raw instructions file and the processed instructions file are JavaScript Object Notation (JSON) instructions files. The processed instructions file comprises a frame number parameter identifying a video frame associated with each recorded action, bounding box coordinates specifying a location of a target user interface element for each interaction step, a click type parameter indicating a type of mouse interaction performed at each step, and time information specifying temporal relationships between consecutive actions. The processed instructions file may comprise step-by-step instructional representations including frame number, bounding box coordinates, click type, and time information for each workflow action.
[0020] In yet another aspect, the machine learning module employs optical character recognition to identify text within images captured during the video recording, extracts textual content from application windows and form field labels, and associates the identified text with detected user interface elements to enable cross-environment element identification. The operations may further comprise generating a structured hierarchical representation of user interface elements present on screens captured during the video recording as a tree structure, wherein the structured hierarchical representation encapsulates information about each user interface element including a type classification of the user interface element, pixel coordinates and dimensions defining spatial arrangement of the user interface element, and a contextual relevance indicator of the user interface element within the specific task workflow.
[0021] In a further aspect, the operations further comprise receiving, via a user interface of an organizer module, conditions from the user comprising a duration of the specific task, a number of repeating cycles, a scheduled start date and time, and output requirements specifying a format and storage location for task results, and executing the specific task by the digital worker according to the conditions at the scheduled start date and time. Providing output resultsmay comprise delivering the output results in a format and location determined by a user- defined shadowing event captured during the video recording, wherein the shadowing event specifies output delivery parameters including structured report format, database population fields, or visual format rendering in spreadsheet applications.
[0022] In one advantageous feature of the present invention, the processed instructions file is created using the video recording and the raw instructions file without requiring programming code from the user. This enables non-technical users to create automation workflows by recording video demonstrations of tasks, enabling the system to generate executable instructions through machine learning analysis rather than manual script development.
[0023] In another advantageous feature of the present invention, the method further comprises maintaining a timeline of activities captured during the video recording that tracks a temporal sequence of user actions and screen changes throughout the video recording, wherein the timeline associates timestamps with each activity to preserve timing relationships between workflow steps and enables the digital worker to execute actions with timing that matches the original task demonstration.
[0024] In another advantageous feature of the present invention, the keyboard inputs comprise individual key presses with associated timing sequences, text entry sequences captured with character-level granularity, and keyboard shortcuts mapped to corresponding application functions, wherein the keyboard inputs are correlated with detected text field elements identified through optical character recognition to generate text entry instructions in the processed instructions file.BRIEF DESCRIPTION OF FIGURES
[0025] FIG. 1 illustrates an environment configured for automated learning and task execution by a digital worker, in accordance with one embodiment of the present invention.
[0026] FIG. 2 illustrates a block diagram of a server suitable for operation, in accordance with one embodiment of the present invention.
[0027] FIG. 3 illustrates a block diagram of a system for automated learning for a digital worker, in accordance with one embodiment of the present invention.
[0028] FIG. 4 illustrates a method for a user registration process at a server, in accordance with one embodiment of the present invention.
[0029] FIG. 5 illustrates a method for automated learning for a digital worker, in accordance with one embodiment of the present invention.
[0030] FIG. 6 illustrates a method for capturing and processing video demonstrations of task steps, in accordance with one embodiment of the present invention.
[0031] FIG. 7 illustrates a method for scheduling and executing tasks by a digital worker, in accordance with one embodiment of the present invention.DETAILED DESCRIPTION
[0032] The following description sets forth exemplary aspects of the present disclosure. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure. Rather, the description also encompasses combinations and modifications to those exemplary aspects described herein.
[0033] A detailed description of systems, devices, and methods consistent with embodiments of the present disclosure is provided below. While several embodiments are described, it should be understood that disclosure is not limited to any one embodiment, but instead encompasses numerous alternatives, modifications, and equivalents. In addition, while numerous specific details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed herein, some embodiments can be practiced without some or all of these details. Moreover, for the purpose of clarity, certain technical material that is known in the related art has not been described in detail in order to avoid unnecessarily obscuring the disclosure.
[0034] Referring to FIG. 1, an environment 100 configured for automated learning and task execution by a digital worker 110 is illustrated. The environment 100 may include a system or server 102, a network 106, client devices 104, a user 108, and a digital worker 110. The client devices 104 may include a first client device 104a, a second client device 104b, and an nth client device 104n, indicating that the environment 100 may support multiple client devices. For ease of reference, the first client device 104a, the second client device 104b, and the nth client device 104n are collectively referred to as client devices 104, or simply client device 104. The client device 104 may indicate electronic device such as mobile device, personal digital assistant, laptop computer, tablet computer, desktop computer, smart watch, and the like. The system 102 may be connected to the network 106, which facilitates communication between the system 102 and the client devices 104.
[0035] With continued reference to FIG. 1, the user 108 may interact with the digital worker 110, which is positioned in communication with the system 102. The digital worker 110 may receive instructions and task assignments through the system 102 based on user input from the user 108. The system 102 may be owned and operated by various entities including anenterprise, organization, a software service provider, a cloud computing platform, or an individual user. The system 102 may also be operated by a managed service provider that offers automation capabilities to multiple client organizations.
[0036] The user 108 may be any individual who interacts with the system 102 to record task demonstrations, configure automation workflows, or manage digital worker assignments. The user 108 may also be a technical administrator responsible for configuring and maintaining the automation system across an organization. The user 108 may interact with the system 102 through the client devices 104 or through direct interfaces provided by the system 102.
[0037] The digital worker 110, also referred as a "DigiMate," may be a software-based agent that executes automated tasks according to processed instructions files generated from user demonstrations. The user 108 may assign tasks to the digital worker 110 or agent. The digital worker 110 is pre-trained with specialized skill sets tailored for specific tasks. The present disclosure envisions providing organizations or users with virtual team members or assistants across various domains, such as a Customer Experience Coordinator or an HR Specialist, depending on their specific needs. The digital worker 110 may be instantiated as a single instance serving one user or as multiple instances serving different users or departments within an organization. In some cases, the digital worker 110 may be deployed to handle specific categories of tasks such as data entry, report generation, or system monitoring. The digital worker 110 may operate autonomously once assigned tasks and schedules, or may operate under supervision where user approval is required before executing certain actions. The digital worker 110 may be powered and equipped with diverse language models that enable the digital worker 110 to handle a broad range of tasks across different fields of expertise with high specificity. The diverse language models may enable the digital worker 110 to process and interpret instructions, understand contextual information, and execute tasks that span multiple domains such as customer service, human resources, finance, healthcare, logistics, and marketing. In some cases, the digital worker 110 may leverage language model capabilities to interpret natural language instructions, generate appropriate responses, and adapt task execution based on the specific requirements of each domain.
[0038] Optionally, the digital worker 110 may be a software-based automation system designed to perform tasks using technologies such as robotic process automation (RPA), artificial intelligence (Al), machine learning (ML), natural language processing (NLP), and computer vision. In some cases, the digital worker 110 may execute tasks autonomously based on processed instructions files generated from video demonstrations recorded by the user 108. The digital worker 110 may work alongside human employees, understanding human intent,responding to queries, and taking actions on behalf of the user 108 while maintaining control and authority with the user 108. The digital worker 110 may be configured to handle unexpected situations and respond with deci si on -making capabilities that closely emulate human judgment. The decision-making capabilities may arise from comprehensive pre-training that exposes the digital worker 110 to diverse data and scenarios. In some cases, the pre-training process may involve exposing the digital worker 110 to a range of data and scenarios that enable the digital worker 110 to develop pattern recognition and adaptive reasoning capabilities. The digital worker 110 may dynamically adjust operations and provide contextually appropriate responses when encountering unanticipated events during task execution. In some cases, the digital worker 110 may analyze, interpret, and react to unexpected situations in a manner consistent with decision-making processes that would be undertaken by humans in similar contexts. The digital worker 110 may deliver output results based on user- defined shadowing events captured during the video recording process.
[0039] The digital worker 110 may utilize a user-defined d shadowing event as a reference to determine the manner, location, and format in which output results are to be presented. As used herein, a shadowing event refers to a specific user action or sequence of actions captured during the video recording that demonstrates how output results should be delivered, including the user's selection of output destinations, file formats, or data entry locations during the task demonstration. The shadowing event captures output delivery parameters when the task recorder 302 records the user 108 performing actions such as saving a file to a particular location, entering data into specific database fields, or formatting results within a spreadsheet application. In some cases, the digital worker 110 may adapt output delivery to present results in a structured report format that organizes information according to predefined templates or layouts demonstrated during the shadowing event. The digital worker 110 may populate a designated database with output results, inserting data into appropriate fields and tables according to the structure captured when the user 108 demonstrated database entry during the task recording. The digital worker 110 may render output information in a predefined visual format based on the user-defined shadowing event. Visual formats may include spreadsheet applications such as Excel sheets where data is organized into rows and columns, or collaborative document platforms such as Coda where information is presented in structured pages with interactive elements. In some cases, the digital worker 110 may generate output in other formats specified by the user 108 during the original task demonstration, ensuring that output delivery aligns with the contextual requirements and preferences established through the shadowing process. The adaptability of output delivery may enhance the utility of thedigital worker 110 across diverse applications by ensuring seamless integration with existing data management and reporting systems used by the user 108 or the organization.
[0040] As further shown in FIG. 1, the environment 100 may support the processing of video recordings and user input events by the system 102. The system 102 may generate processed instructions files that are transmitted through the network 106 for execution by the digital worker 110. In some cases, the client devices 104 may access the output results and interact with the system through the network 106, enabling users associated with the client devices 104 to monitor, schedule, and manage tasks performed by the digital worker 110.
[0041] The network 106 may include a wireless network, a wired network, or a combination thereof. The network 106 may be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the Internet, and the like. The network 106 may be implemented as a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol / Internet Protocol (TCP / IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like. The system 102 may be implemented as a single server or as a plurality of servers operating in a distributed configuration.
[0042] Referring to FIG. 2, a block diagram of the system 102 suitable for operation according to embodiments of the present disclosure is illustrated. The system 102 may encompass a processor 202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both). The processor 202 may electrically couple by a bus 204 to a memory 206. The memory 206 may include volatile memory and / or non-volatile memory. Preferably, the memory 206 may store instructions 208 that interact with the other devices in the system 102 and / or the environment 100. In one implementation, the processor 202 may execute the instructions 208 stored in the memory 206 in any suitable manner. The system 102 may further include an I / O interface device 210 and a network interface device 224, all interconnected via the bus 204.
[0043] With continued reference to FIG. 2, the bus 204 may facilitate communication between the various components of the system 102.
[0044] The memory 206 may take the form of computer-readable media, which may include one or more volatile media, nonvolatile media, removable media, or non-removable media. The memory 206 may store data structures, program modules, and other data used by theprocessor 202 during processing of video recordings and generation of instructions files for the digital worker 110.
[0045] As further shown in FIG. 2, the system 102 may further include a display 212 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The system 102 may include an input device 214 (e.g., a keyboard) and / or touchscreen, a UI navigation device 216 (e.g., a mouse), a drive unit 218, and a signal generation device 222 (e.g., a speaker). The I / O interface device 210 may manage input and output operations between the system 102 and the connected peripheral devices.
[0046] The display 212 may provide visual output capabilities, enabling presentation of information through visual means such as screens, graphical user interfaces (GUIs), or lightemitting diodes (LEDs). The input device 214 may comprise keyboards, microphones, touchscreens, or other items useable to directly or indirectly input data into the system 102. The UI navigation device 216 may enable user interaction with the system 102 for navigating interfaces and selecting options.
[0047] With continued reference to FIG. 2, the drive unit 218 may include a machine- readable medium 220 on which one or more sets of instructions and data structures (e.g., instructions 208) may be stored. It should be understood that the term "machine-readable medium" includes a single medium or multiple medium (e.g., a centralized or distributed database, and / or associated caches and servers) that stores one or more sets of instructions. The term "machine-readable medium" also includes any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term "machine-readable medium" accordingly includes, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals. In some cases, the machine-readable medium 220 may be non-transitory such that the machine-readable medium 220 does not comprise a signal per se. The signal generation device 222 may generate signals as required by system operations, including alerts or notifications related to task processing.
[0048] The instructions 208 may reside, completely or at least partially, within the memory 206 and / or within the processor 202 during execution thereof by the system 102. The network interface device 224 may transmit or receive the instructions 208 over the network 106 utilizing any one of a number of well-known transfer protocols. The network interface device 224 may provide connectivity to the network 106, enabling the system 102 to communicate with externalsystems and devices, including the client devices 104 and the digital worker 110. The network interface device 224 may support concurrent multiple communication technologies.
[0049] In operation, the processor 202 may receive video recordings and user input events through the network interface device 224 from the user 108. The processor 202 may execute the instructions 208 stored in the memory 206 to process the video recordings and generate raw instructions files. The processor 202 may further process the raw instructions files using machine learning algorithms to produce processed instructions files adapted to the context of specific tasks. The processed instructions files may be stored in the memory 206 or the machine-readable medium 220 and transmitted through the network interface device 224 to the digital worker 110 for task execution. The display 212 and the input device 214 may enable administrators to monitor and configure the processing operations performed by the system 102
[0050] Referring to FIG. 3, a block diagram of the system for automated learning for a digital worker is illustrated, in accordance with one exemplary embodiment of the present invention. The system may include a user 108, a network 106, and a system 102 containing multiple functional modules. The user 108 may use the client device 104 to connect to the system 102 via the network 106.
[0051] With continued reference to FIG. 3, the system 102 may comprise a task recorder 302 configured to record video demonstrations of task steps performed by the user 108. The task recorder 302 may capture a video that demonstrates all the steps for executing a specific task. In some cases, the task recorder 302 may monitor and capture detailed user interactions during the recording process, enabling the system to generate a comprehensive record of the actions performed by the user 108.
[0052] The task recorder 302 may capture screen activity during the video recording process. Screen activity may include any visual changes that occur on a display during task execution, such as window transitions, application launches, menu selections, form field entries, and navigation between different screens or web pages. In some cases, the task recorder 302 may capture screenshots at regular intervals or upon detection of screen changes to provide a frame- by-frame record of the task demonstration.
[0053] The task recorder 302 may capture video at a frame rate sufficient to record application state transitions and user interactions with precision. In some cases, the frame rate may be configured to capture a minimum number of frames per second that enables detection of rapid screen changes, menu selections, and transient interface elements. The frame rate may be adjusted based on the complexity of the task being demonstrated or the performancecapabilities of the client device 104. Higher frame rates may be employed for tasks involving rapid interactions or animations, while lower frame rates may be sufficient for tasks involving slower, deliberate actions.
[0054] As further shown in FIG. 3, the task recorder 302 may capture mouse movements performed by the user 108 during the recording. Mouse movements may include cursor position changes, mouse trajectory paths, hover actions over interface elements, and dwell time on specific screen locations. The task recorder 302 may record mouse click events, including left clicks, right clicks, double clicks, and click-and-drag operations. In some cases, the task recorder 302 may capture the coordinates of each mouse activity on the screen, enabling precise reproduction of the recorded actions by a digital worker.
[0055] The task recorder 302 may capture detailed mouse activity including cursor trajectory paths that track the movement of the mouse pointer across the screen over time. In some cases, the task recorder 302 may record dwell time indicating the duration that the cursor remains stationary at specific screen locations, which may indicate user attention or hesitation during task execution. The task recorder 302 may capture hover actions where the cursor pauses over interface elements without clicking, which may trigger tooltip displays or menu expansions that are relevant to the task workflow.
[0056] The task recorder 302 may capture keyboard inputs entered by the user 108 during task demonstration. Keyboard inputs may include individual key presses, key combinations such as keyboard shortcuts (e.g., Ctrl+C for copy, Ctrl+V for paste), and text entry sequences. In some cases, the task recorder 302 may record the timing and sequence of keyboard inputs to maintain the temporal relationship between different actions performed during the task. The task recorder 302 may capture special key inputs such as function keys, navigation keys, and modifier keys.
[0057] The task recorder 302 may capture keyboard inputs with character -level granularity, recording individual key codes and modifier states for each keystroke. Key codes may identify the specific key pressed, while modifier states may indicate whether modifier keys such as Shift, Control, Alt, or Command were held during the keystroke. In some cases, the task recorder 302 may capture the timing sequence of keystrokes to preserve the rhythm and pacing of text entry. The keyboard input capture may distinguish between individual key presses, key hold durations, and key release events to enable accurate reproduction of keyboard interactions by the digital worker 110.
[0058] With continued reference to FIG. 3, the task recorder 302 may capture voice commands provided by the user 108 during the recording process. Voice commands mayinclude spoken instructions that describe actions being performed, verbal annotations that provide context for specific steps, or audio cues that indicate transitions between different phases of the task. In some cases, the task recorder 302 may process voice commands using speech recognition to convert spoken words into text -based instructions that supplement the visual recording.
[0059] The task recorder 302 may process voice commands using speech recognition to convert spoken words into text-based instructions that supplement the visual recording. Speech recognition processing may employ automatic speech recognition algorithms that transcribe audio input captured during the video recording. In some cases, the transcribed text may be associated with corresponding timestamps to enable correlation between spoken instructions and visual actions occurring at the same moment in the task demonstration. The speech recognition processing may support multiple languages and may be configured to recognize domain-specific terminology relevant to the task being demonstrated.
[0060] The task recorder 302 may capture natural language instructions provided by the user 108. Natural language instructions may include typed descriptions of task steps, textual annotations added during or after the recording, or explanatory notes that clarify the purpose of specific actions. In some cases, natural language instructions may be provided through a text input interface associated with the task recorder 302, enabling the user 108 to add contextual information that enhances the understanding of the recorded task.
[0061] Examples of different types of user input events that may be captured by the task recorder 302 include: clicking on a button to submit a form; typing text into a search field; selecting an item from a dropdown menu; dragging a file from one folder to another; scrolling through a document or web page; right-clicking to access a context menu; using keyboard shortcuts to copy and paste data; speaking a command to navigate to a specific application; and providing a natural language description of the purpose of a particular action sequence. In some cases, the task recorder 302 may capture combinations of these user input events occurring simultaneously or in rapid succession during complex task demonstrations.
[0062] The system 102 may comprise a machine learning module 306 configured to process video recordings along with raw instructions files to generate processed instructions files adapted to the context of specific tasks. The machine learning module 306 may utilize videobased task analysis that combines Optical Character Recognition (OCR) with contextual Al interpretation. In some cases, the machine learning module 306 may process the recorded video and the raw instructions file simultaneously to extract meaningful information about the task being demonstrated by the user 108.
[0063] The machine learning module 306 may employ advanced computer vision techniques to analyze the content of video recordings. Computer vision techniques utilized by the machine learning module 306 may include detection and recognition of user interface (UI) elements present on screens captured during the video recording. In some cases, the machine learning module 306 may identify different types of UI elements such as buttons, text fields, dropdown menus, checkboxes, radio buttons, links, icons, and other interactive components displayed on the screen during task execution.
[0064] The machine learning module 306 may apply object detection algorithms to identify the boundaries, locations, and classifications of user interface elements within each analyzed frame. Object detection algorithms may include convolutional neural network architectures trained to recognize common UI element patterns. The machine learning module 306 may classify detected user interface elements according to element type, including buttons, text fields, dropdown menus, checkboxes, radio buttons, links, icons, sliders, and other interactive components. In some cases, the classification may be based on visual characteristics such as shape, size, border styling, and text content associated with each element.
[0065] The machine learning module 306 may extract visual features from each frame of the video recording to support user interface element detection and recognition. In some cases, visual features may include characteristics that distinguish interactive elements from background content, boundaries that define the perimeters of buttons and input fields, and text regions that contain labels, placeholders, or content within interface elements. The machine learning module 306 may analyze spatial relationships between visual elements to improve detection accuracy.
[0066] The machine learning module 306 may employ multimodality training techniques that combine multiple data sources and analysis methods. Multimodality training techniques may include the integration of visual analysis from video frames, textual analysis from OCR processing, and contextual analysis from user input events captured during the recording. In some cases, the machine learning module 306 may combine image captioning capabilities with UI element detection to generate descriptive representations of screen content and user actions.
[0067] The machine learning module 306 may generate a structured hierarchical representation of UI elements present on a given screen. The structured hierarchical representation may take the form of a tree structure that organizes UI elements according to their relationships and positions within the screen layout. In some cases, the tree structure may encapsulate information about each UI element including the type of the UI element, the spatialarrangement of the UI element relative to other elements on the screen, and the contextual relevance of the UI element within the task being performed.
[0068] The type information stored in the hierarchical representation may indicate whether a UI element is a button, a text input field, a label, a container, a navigation element, or another category of interface component. The spatial arrangement information may include coordinates, dimensions, and relative positioning of each UI element within the screen. The contextual relevance information may indicate the role of each UI element in the task workflow, such as whether the element is a target for user interaction, a source of data, or a navigational component.
[0069] The machine learning module 306 may be capable of analyzing each frame in the recorded video. Frame-by-frame analysis may enable the machine learning module 306 to detect changes in screen content, identify transitions between different application states, and track the progression of the task through various stages. In some cases, the machine learning module 306 may compare consecutive frames to identify user-defined events such as button clicks, form submissions, or navigation actions that occur during the task demonstration.
[0070] The machine learning module 306 may detect screen state changes through frame comparison techniques including frame differencing. Frame differencing may involve comparing pixel values between consecutive frames to identify regions of the screen where changes have occurred. In some cases, the machine learning module 306 may apply thresholdbased differencing to distinguish significant screen changes from minor variations caused by cursor movement or display artifacts. The frame comparison may enable detection of application state transitions, dialog box appearances, page navigation events, and content updates that occur during the task demonstration.
[0071] The machine learning module 306 may recognize user-defined events captured during the video recording. User-defined events may include mouse clicks, keyboard inputs, drag- and-drop operations, and other interactions performed by the user 108 during the task demonstration. In some cases, the machine learning module 306 may interpret each control within its context by analyzing the surrounding UI elements and the sequence of actions leading up to and following the interaction with the control.
[0072] The machine learning module 306 may maintain a timeline of all activities captured during the video recording. The activity timeline may track the temporal sequence of user actions and screen changes throughout the task demonstration. In some cases, the machine learning module 306 may associate timestamps with each activity to preserve the timing relationships between different steps in the task workflow.
[0073] The machine learning module 306 may correlate detected user interface elements with user input events based on temporal alignment between video frames and captured interaction timestamps. Temporal alignment may involve matching the timestamp of each user input event with the corresponding video frame that was displayed at the moment of the interaction. In some cases, the machine learning module 306 may interpolate between frames when user input events occur between frame capture times. The temporal alignment may enable the system 102 to associate each mouse click, keyboard input, or other user action with the specific user interface element that was the target of the interaction.
[0074] The machine learning module 306 may track coordinates associated with user activities. Coordinates tracked by the machine learning module 306 may include mouse movement positions, click locations, and cursor trajectories across the screen. In some cases, the machine learning module 306 may record keyboard actions along with their associated context, such as mapping keyboard shortcuts like Ctrl+C to the copy function or Ctrl+V to the paste function. The tracking of coordinates and keyboard actions may enable the digital worker 110 to reproduce the recorded task with precision across different environments.
[0075] The machine learning module 306 may identify text within images and on the screen using Optical Character Recognition (OCR). OCR processing may enable the machine learning module 306 to extract textual content from screenshots, application windows, documents, and other visual elements captured during the video recording. In some cases, the machine learning module 306 may use OCR to identify labels associated with UI elements, read content from text fields, and extract data displayed in tables or lists. The text identified through OCR may be incorporated into the processed instructions file to provide context for the actions performed during the task demonstration.
[0076] The raw instructions file may be structured as a JavaScript Object Notation (JSON) instructions file that captures detailed parameters from the video recording and user input events. The raw instructions file may contain coordinates that specify the screen positions where user interactions occurred during the task demonstration. The raw instructions file may include equations to adapt and recognize components displayed on the screen during the task demonstration.
[0077] The raw instructions file may include coordinate dots per inch (DPI) parameters that account for variations in display resolution and scaling. In some cases, the coordinate DPI parameters may enable translation of user input events into instructions that can be executed accurately on displays with different pixel densities or scaling factors. The raw instructions file may contain additional parameters that support the translation of user input events intoexecutable instructions for digital workers. These parameters may include timing information that captures the duration between consecutive actions, window state information that describes whether applications were maximized, minimized, or in a specific position, and input device state information that tracks modifier keys held during interactions.
[0078] The processed instructions file may be structured as a JSON instructions file that contains refined and contextualized information derived from the raw instructions file and video analysis. The processed instructions file may contain a frame number parameter that identifies the specific video frame associated with each recorded action. In some cases, the frame number may enable correlation between the visual content of the video recording and the corresponding user interaction captured at that moment in the task demonstration.
[0079] The processed instructions file may include a bounding box (BBX) parameter for each extracted UI element with which the user interacted during the task demonstration. The bounding box may define the rectangular region on the screen that encompasses a specific UI element, specified by coordinates that identify the top -left comer and the dimensions of the region. In some cases, the bounding box may enable the digital worker to locate and interact with UI elements based on their visual boundaries rather than relying solely on fixed screen coordinates.
[0080] The processed instructions file may contain a click type parameter that specifies whether a mouse interaction was a right click or a left click. The click type parameter may enable the digital worker to reproduce the correct type of mouse interaction when executing the task. The processed instructions file may include time information that captures the temporal aspects of each recorded action. The time information may specify when each action occurred relative to the start of the recording or relative to the previous action in the sequence. In some cases, the time information may enable the digital worker to execute actions with appropriate timing and pacing that matches the original task demonstration.
[0081] The processed instructions file may contain step-by-step instructional representations of the flow actions of the task. Each instruction in the processed instructions file may include the frame number, the bounding box for the target UI element, the click type, and the time information associated with that step. In some cases, the processed instructions file may include additional contextual information derived from the machine learning analysis, such as the identified type of UI element, the text content associated with the element, or the inferred purpose of the interaction within the task workflow.
[0082] The system 102 may comprise a flow editor 304 configured to enable review and modification of processed instructions files generated by the machine learning module 306.The flow editor 304 may provide an interactive environment for configuring and refining automation processes. In some cases, the flow editor 304 may serve as a platform where the user 108 can examine, adjust, and validate the instructions derived from video demonstrations before assigning tasks to the digital worker 110.
[0083] The flow editor 304 may integrate a video playback interface as a central feature. The video playback interface may enable the user 108 to visualize workflows in action by replaying the recorded video demonstration. In some cases, the video playback interface may display the original video recording alongside the corresponding processed instructions, allowing the user 108 to verify that each instruction accurately represents the intended action captured during the task demonstration. The flow editor 304 may include a timeline feature positioned directly beneath the video playback interface. The timeline feature may enable precise navigation through various stages of the workflow.
[0084] The flow editor 304 may include functionalities for adding, deleting, or modifying actions to the processed instructions file. Adding actions may enable the user 108 to insert additional steps into the workflow that were not captured during the original video recording. Deleting actions may enable the user 108 to remove steps that are unnecessary, redundant, or incorrectly captured during the video recording. Modifying actions may enable the user 108 to adjust parameters associated with existing steps, such as changing target coordinates, updating timing values, or altering the type of interaction performed at a particular step.
[0085] The system 102 may validate the processed instructions file via the flow editor 304 by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file to verify that each detected user interface element accurately corresponds to the user's original interaction. The programmatic comparison may be performed by an artificial intelligence agent as part of the validation process. In some cases, the programmatic comparison may calculate alignment metrics between the generated instructions and the captured user input events, flagging discrepancies that exceed configurable thresholds for user review. The flow editor 304 may display the video recording alongside corresponding processed instructions for user verification, enabling the user 108 to review flagged discrepancies and confirm that the validated instructions accurately represent the intended workflow actions.
[0086] The flow editor 304 may include functionalities for looping actions within the processed instructions file. Looping actions may enable the user 108 to configure specific steps or sequences of steps to repeat a specified number of times or until a particular condition ismet. The flow editor 304 may include functionalities for managing files associated with the workflow. File management functionalities may enable the user 108 to specify input files that the digital worker 110 should process, define output file locations where results should be saved, or configure file paths that the workflow references during execution.
[0087] The flow editor 304 may include functionalities for managing variables or scripts within the workflow. Variable management functionalities may enable the user 108 to define named values that can be referenced throughout the processed instructions file. In some cases, variables may store data extracted during task execution, hold configuration parameters that control workflow behavior, or maintain state information that persists across multiple steps in the workflow. Script management functionalities may enable the user 108 to incorporate custom code segments that perform specialized processing not captured through the video demonstration.
[0088] The flow editor 304 may feature an integration module that facilitates connectivity with external applications and systems. The integration module may enable the workflow to exchange data with third-party software, web services, databases, or enterprise systems. In some cases, the integration module may provide connectors or adapters that enable the digital worker 110 to interact with applications beyond those demonstrated in the original video recording, thereby expanding the scope and scal ability of the automation capabilities.
[0089] The system 102 may comprise an organizer module 308 configured to control and schedule the performance of specific tasks by the digital worker 110. The organizer module 308 may provide a task scheduling feature that enables the user 108 to create, configure, and schedule flow tasks within an automated workflow. In some cases, the organizer module 308 may present a streamlined interface that allows the user 108 to define parameters for task execution with flexibility and precision.
[0090] The organizer module 308 may provide flow selection functionality through a dropdown menu. The dropdown menu may display available pre-defined flow tasks that have been processed and validated through the flow editor 304. The organizer module 308 may enable the user 108 to specify a start date and time for task execution. The start date and time specification may allow the user 108 to define the exact moment at which the digital worker 110 should commence execution of the assigned task. The organizer module 308 may include repeat settings that enable configuration of recurring task execution. The repeat settings may allow the user 108 to set the frequency of task execution, such as daily, weekly, or at other user-defined intervals. The organizer module 308 may assign specific tasks to digital workers based on digital worker availability status and pre-trained capabilities, enabling the system 102to match task requirements with appropriate digital workers that are available and possess the requisite skill sets for the assigned workflow.
[0091] The system 102 may comprise a user module (not shown) that displays information about available digital workers capable of performing assigned tasks. The user module may present each digital worker's name as a designated identifier that allows for recognition and differentiation among multiple digital workers within the system.
[0092] The user module may display each digital worker's role, which describes the functional assignment of the digital worker within the organization. Roles may include designations such as Customer Experience Coordinator, HR Specialist, or other domainspecific assignments that align with organizational needs. In some cases, the role information may indicate the category of tasks that the digital worker is configured to perform or the department that the digital worker serves. The user module may also display information about each digital worker's pre-trained capabilities, indicating the specialized skill sets and domain expertise that the digital worker possesses for executing specific categories of tasks.
[0093] The user module may display each digital worker's current flow task status. The current flow task status may indicate the operational state of the digital worker at any given time. In some cases, the status may indicate Idle, reflecting that the digital worker is available for new task assignments. The status may also indicate New Task Assigned, reflecting that the digital worker is actively engaged in a specific workflow or has been assigned a task pending execution. The display of current flow task status may enable the user 108 to monitor workload distribution and identify available digital workers for new assignments. The organizer module 308 may utilize the availability status displayed by the user module to determine which digital workers are idle and available for task assignment, and may further evaluate the pre -trained capabilities of available digital workers to select a digital worker whose skill sets align with the requirements of the specific task being assigned.
[0094] The memory 206 may store data structures associated with a recorded flows module (not shown) that serves as a centralized repository for all saved automated workflows. The memory 206 may maintain catalog entries for each workflow with a customizable identifier that enables the user 108 to assign meaningful and descriptive names to saved workflows.
[0095] Each workflow stored in the memory 206 through the recorded flows module may be associated with a status indicator that provides information about the processing state of the workflow. The status indicator stored in the memory 206 may indicate Processed, denoting that the workflow has been processed by the machine learning module 306 and is ready for execution by the digital worker 110. The status indicator may indicate Processing, denotingthat the workflow is currently being processed by the machine learning module 306 and is not yet available for execution. The status indicator may indicate Failed, denoting that the workflow encountered errors or interruptions during processing by the machine learning module 306. In some cases, the status indicators maintained in the memory 206 may enable the user 108 to monitor the progress of workflow processing and identify workflows that require attention or troubleshooting.
[0096] The system 102 may comprise a reporting module 310 configured to provide a detailed and comprehensive log of all executed tasks. The reporting module 310 may enable the user 108 to monitor, analyze, and evaluate workflow performance by presenting execution data in an accessible format. In some cases, the reporting module 310 may serve as a centralized repository of execution data that offers insights into task outcomes and operational trends.
[0097] Each entry in the reporting module 310 may contain a task name that serves as a descriptive identifier for the task. The task name may facilitate recognition and reference when the user 108 reviews execution records. Each entry in the reporting module 310 may contain an outcome status that indicates the result of the task execution. The outcome status may indicate Succeeded, denoting that the task completed without errors and produced the expected results. The outcome status may indicate Failed, denoting that the task encountered errors or was unable to complete the intended operations. The outcome status may indicate Cancelled, denoting that the task was terminated before completion, either by user intervention or by system conditions. In some cases, the outcome status may enable the user 108 to identify tasks that require attention or troubleshooting.
[0098] Each entry in the reporting module 310 may contain an execution history that provides a chronological log of activities associated with the task. The execution history may capture events that occurred throughout the lifecycle of the task, including initiation, intermediate steps, and completion or termination. In some cases, the execution history may include timestamps for each recorded event, enabling the user 108 to trace the progression of task execution and identify points where issues may have occurred. Each entry in the reporting module 310 may contain a date of latest activity that indicates the timestamp of the most recent action related to the task. The date of latest activity may provide a clear timeline for tracking progress and performance of executed tasks.
[0099] The reporting module 310 may allow the user 108 to click on any task entry to access a screen-recorded video of the task execution. The screen-recorded video may capture the visual display during execution of the task by the digital worker 110, showing the actions performed and the screen content at each step. In some cases, the screen -recorded video maybe available regardless of the task's outcome, enabling the user 108 to review successful executions, failed executions, or cancelled executions. Access to the screen -recorded video may enable the user 108 to identify errors or bottlenecks in failed or cancelled tasks, analyze successful task executions to understand practices and strategies, and gather information for refining workflows and enhancing efficiency.
[0100] The system 102 may comprise a statistics module 312 configured to provide performance metrics for digital workers. The statistics module 312 may offer the user 108 a quantifiable view of contributions and operational efficiency of digital workers over a designated time period. In some cases, the statistics module 312 may aggregate and display data points related to activities performed by digital workers, enabling the user 108 to assess overall impact and productivity.
[0101] The statistics module 312 may track a total number of tasks completed by digital workers during a specified period. The total number of tasks completed may provide an indication of workload and operational output of the digital workers. The statistics module 312 may track cumulative hours worked by digital workers. The cumulative hours worked may represent the total amount of time that digital workers have spent performing tasks during the specified period. The statistics module 312 may track total time saved through automated task execution. The total time saved may be calculated by comparing the efficiency of automated tasks executed by digital workers versus manual execution of the same tasks by human workers.
[0102] The system 102 may comprise a player module 314 configured to execute assigned tasks at scheduled times and under specified conditions. The player module 314 may serve as an execution engine within the system 102 that manages the runtime performance of automated workflows by digital workers. In some cases, the player module 314 may function as an intermediary component that receives validated processed instructions files and orchestrates their execution according to scheduling parameters and user-defined conditions. The player module 314 may interpret the step-by-step instructions contained within processed instructions files and direct the digital worker 110 to perform corresponding actions on target applications and interfaces.
[0103] The player module 314 may manage the lifecycle of task execution from initiation through completion. In some aspects, the player module 314 may monitor the start date and time configured through the organizer module 308 and trigger task execution when the scheduled moment arrives. The player module 314 may coordinate with the digital worker 110to ensure that each instruction step is performed in the correct sequence and with appropriate timing as specified in the processed instructions file.
[0104] The player module 314 may handle execution state management during task performance. In some cases, the player module 314 may track the current position within a workflow, maintain awareness of which steps have been completed, and manage transitions between consecutive actions. The player module 314 may also manage pause and resume functionality when human-in-the-loop validation is required during task execution.
[0105] The player module 314 may process conditions specified by the user 108 that govern task execution behavior. In some aspects, the player module 314 may enforce duration limits that constrain how long a task may run, manage repeating cycles that cause a workflow to execute multiple times, and handle output requirements that specify how results should be delivered upon task completion. The player module 314 may communicate execution status and results to the reporting module 310 for logging and subsequent review by the user 108.
[0106] Once a task has been validated through the flow editor 304, the user 108 may assign the task to a specific digital worker for execution. The player module 314 may enable the user 108 to set the precise time for the task to begin, allowing for scheduling and synchronization with broader workflow operations. In some cases, the player module 314 may receive task assignments from the organizer module 308 and initiate execution of the assigned workflow at the designated time according to the conditions specified by the user 108.
[0107] The player module 314 may support human-in-the-loop computation during task execution. Human-in-the-loop computation may introduce a feedback mechanism where automated systems and human oversight operate collaboratively. In some cases, when a decision requires further validation or an uncertain action arises during task execution, the digital worker may communicate its adaptation decision and request validation from a human operator. The player module 314 may pause task execution pending human approval and resume execution upon receiving validation from the user 108 or another designated operator. Human-in-the-loop computation may foster a collaborative environment that ensures the automation process benefits from both computational efficiency and human expertise when circumstances warrant human involvement.
[0108] FIG. 4 shows a method 400 for a user registration process at the system 102, in accordance with one embodiment of the present invention. The order in which the method 400 is described should not be construed as a limitation, and any number of the described process blocks can be combined in any order to implement the process or alternate processes. Additionally, individual blocks may be deleted from the method 400 without departing fromthe spirit and scope of the invention described herein. Furthermore, the method 400 can be implemented in any suitable hardware, software, firmware, or combination thereof.
[0109] The method 400 may start at step 402. At step 402, the system 102 may receive a user registration request. The user registration request may be transmitted from a client device 104 operated by a prospective user seeking to access the automated learning system. The system 102 may receive information provided by the prospective user through a registration interface, such as a web form or application screen. The registration request may contain user -provided data including a username, an email address, a password, and organizational affiliation information. The registration request may include additional information such as contact details, department designation, or role within an organization.
[0110] At step 404, the system 102 may validate user credentials. Validation of user credentials may involve verification of the information provided in the registration request to confirm that the prospective user meets requirements for account creation. The system 102 may verify that the email address provided is in a valid format and corresponds to an active email account. The system 102 may check that the username is not already associated with an existing account in the system. The system 102 may verify that the password meets security requirements such as minimum length, character complexity, or other password policy criteria established by the system administrator.[OHl] The system 102 may validate organizational affiliation during step 404. The system 102 may confirm that the email domain corresponds to an organization that has a subscription or license agreement for the automated learning system. The system 102 may verify that the prospective user is authorized to create an account under the organization's subscription. The validation may involve sending a verification code to the email address provided and requiring the prospective user to enter the verification code to confirm ownership of the email account.
[0112] At step 406, the system 102 may create a user account. Creation of the user account may involve storing the validated user credentials and associated information in a database maintained by the system 102. The system 102 may generate a unique user identifier that distinguishes the new account from other accounts in the system. The system 102 may establish default settings and preferences for the new user account, including notification preferences, interface language settings, and access permissions.
[0113] The system 102 may associate the new account with an organizational tenant or workspace during step 406. The system 102 may assign the new user account to a specific organizational group based on the email domain or organizational affiliation information provided during registration. The system 102 may configure access permissions for the newuser account based on the user's role within the organization or based on default permission settings established by an administrator.
[0114] At step 408, the system 102 may send a confirmation to the user. The confirmation may be transmitted to the email address provided during registration to notify the user that the account has been created. The confirmation may include a welcome message that provides information about the automated learning system and instructions for getting started. The confirmation may include a link that enables the user to log in to the system for the first time.
[0115] The system 102 may provide information about available resources during step 408, such as documentation, tutorials, or support channels that the user may access to learn about the system capabilities. The confirmation may include information about the user's subscription tier or license type, indicating the features and digital workers available under the user's account. The confirmation may include contact information for technical support or customer service representatives who may assist the user with questions or issues.
[0116] At step 410, the system 102 may display available digital workers to the user. The display of available digital workers may occur when the user logs in to the system 102 following account creation. The system 102 may present a dashboard or interface that shows the digital workers 110 that the user 108 may access based on the user's subscription, organizational affiliation, or assigned permissions.
[0117] The system 102 may display information about each digital worker's name, role, and capabilities during step 410. The display may indicate which digital workers are pre-trained with skill sets relevant to specific task categories such as customer service, human resources, finance, or data processing. The user may select a digital worker from the displayed options to begin recording task demonstrations and assigning automated workflows.
[0118] The method 400 may support various user onboarding scenarios. A business analyst at a financial services company may register for an account to automate report generation tasks. Following registration and credential validation, the business analyst may receive a confirmation email and log in to view digital workers configured for financial data processing and reporting. An operations manager at a healthcare organization may register for an account to automate patient scheduling workflows. Following account creation, the operations manager may access digital workers pre-trained for healthcare administration tasks.
[0119] The method 400 may support additional user onboarding scenarios. An administrative professional at a logistics company may register for an account to automate shipment tracking and inventory management tasks. Following the user registration process, the administrative professional may view digital workers capable of interacting with logistics software and supplychain management systems. A human resources specialist may register for an account to automate employee onboarding documentation and payroll processing. The human resources specialist may gain access to digital workers configured for HR administration tasks following completion of the registration process.
[0120] FIG. 5 shows a method 500 for automated learning for a digital worker, in accordance with one embodiment of the present invention. The order in which the method 500 is described should not be construed as a limitation, and any number of the described process blocks can be combined in any order to implement the process or alternate processes. Additionally, individual blocks may be deleted from the method 500 without departing from the spirit and scope of the invention described herein. Furthermore, the method 500 can be implemented in any suitable hardware, software, firmware, or combination thereof.
[0121] The method 500 may start at step 502. At step 502, a user 108 may record a video illustrating the steps involved in executing a specific task. The video recording may capture all actions performed by the user 108 during demonstration of the task, including screen activity, mouse movements, keyboard inputs, and other interactions with applications and interfaces. The user 108 may initiate the video recording through the task recorder 302 and proceed to perform the task from start to finish while the system 102 captures the demonstration.
[0122] Consider a scenario that the user 108 wishes to record a video demonstration of a data entry task. The user 108 may open a web browser, navigate to a customer relationship management (CRM) application, log in with credentials, locate a specific customer record, update contact information fields, and save the changes. The video recording may capture each of these actions as the user 108 performs the task, providing a visual record of the complete workflow. Similarly, the user 108 may record a video demonstration of an invoice processing task, where the user 108 opens an email application, downloads an attached invoice document, extracts relevant data from the invoice, and enters the data into an accounting system.
[0123] At step 504, the system 102 may generate a raw instructions file containing the recorded steps and user input events. The raw instructions file may capture detailed parameters from the video recording, including coordinates of mouse clicks, keyboard input sequences, timing information between actions, and screen state changes. The raw instructions file may be structured as a JSON file that organizes the captured data in a format suitable for subsequent processing by the machine learning module 306.
[0124] The step 504 may encompass generation of the raw instructions file that captures a sequence of mouse clicks performed during the data entry task demonstration. The raw instructions file may record that the user 108 clicked at specific screen coordinates to select atext field, typed a customer name using the keyboard, pressed the Tab key to move to the next field, and clicked a Save button at another set of coordinates. The raw instructions file may capture voice commands provided by the user 108 during the recording, such as spoken annotations that describe the purpose of each action in the workflow.
[0125] At step 506, the machine learning module 306 may process the video recording and the raw instructions file to generate a processed meaningful instructions file adapted to the context of the specific task. The machine learning module 306 may analyze each frame of the video recording, detect and recognize user interface elements, apply optical character recognition to extract text content, and correlate the visual analysis with the user input events captured in the raw instructions file. The processing may generate a hierarchical representation of UI elements and produce step-by-step instructions that enable the digital worker 110 to reproduce the demonstrated task.
[0126] The step 506 may encompass the machine learning module 306 processing the data entry task demonstration and identifying that the user 108 interacted with a login form containing username and password fields, navigated to a customer search interface, selected a specific customer record from search results, and modified fields within a customer details form. The processed instructions file may include bounding box coordinates for each UI element, the type of interaction performed (such as left click or text entry), and contextual information about the purpose of each step. The machine learning module 306 may recognize that a button labeled "Save" was clicked at the end of the workflow and include this contextual information in the processed instructions file.
[0127] At step 508, the system 102 may validate the processed instructions file by the user or an artificial intelligence agent. Validation may involve review of the generated instructions to confirm that each step accurately represents the intended action from the original task demonstration. The user 108 may use the flow editor 304 to examine the processed instructions file, view the corresponding video segments, and verify that the machine learning interpretation correctly captured the workflow. The user 108 may verify that the customer search step includes the correct search criteria and that the data entry steps target the appropriate form fields. In one example, an artificial intelligence agent (e.g., digital worker 110) may perform automated validation by comparing the processed instructions against patterns learned from similar task demonstrations and flagging any steps that appear inconsistent or incomplete.
[0128] At step 510, the system 102 may assign the specific task to a digital worker. Assignment of the task may involve selecting a digital worker 110 from available digital workers displayed through the user module and associating the validated processed instructionsfile with the selected digital worker. The user 108 may assign the task through the organizer module 308, specifying which digital worker should execute the workflow. The step 510 may encompass the user 108 assigning the data entry task to a digital worker 110 designated as a Customer Data Coordinator. The user 108 may select the digital worker from a list of available digital workers and confirm the assignment through the dashboard interface. For example, the user 108 may assign an invoice processing task to a digital worker 110 designated as a Finance Assistant, matching the task type with a digital worker pre-trained for financial data processing operations.
[0129] At step 512, the digital worker 110 may execute the specific task according to the processed instructions file and conditions provided by the user. Execution of the task may involve the digital worker 110 following the step-by-step instructions in the processed instructions file, interacting with applications and interfaces as demonstrated in the original video recording. The user 108 may specify conditions fortask execution through the organizer module 308, including the duration of the task, the number of repeating cycles, and any outputs required.
[0130] The step 512 may encompass the digital worker 110 executing the data entry task by launching a web browser, navigating to the CRM application, entering login credentials, searching for the specified customer record, updating the contact information fields according to the processed instructions, and saving the changes. The user 108 may have specified that the task should be executed daily at a scheduled time and should process a batch of customer records from a designated input file. Similarly, the digital worker 110 may execute an invoice processing task by downloading invoice attachments from emails, extracting data using the steps captured in the processed instructions file, and entering the extracted data into the accounting system according to the workflow demonstrated by the user 108.
[0131] At step 514, the digital worker 110 may provide output results of the specific task as specified by the user. The output results may be delivered in a format and location determined by the user-defined shadowing event captured during the original video recording. The digital worker 110 may generate a report summarizing the actions performed, populate a database with processed data, or save output files to a designated location.
[0132] The step 514 may encompass the digital worker 110 generating a summary report that lists all customer records updated during the data entry task execution, including timestamps for each update and confirmation of successful completion. The report may be saved to a shared folder or sent to the user 108 via email as specified during the task demonstration. For example, the digital worker 110 may provide output results for an invoice processing task by updating aspreadsheet with extracted invoice data, including vendor names, invoice amounts, and due dates, and sending a notification to the user 108 indicating that the batch processing has been completed.
[0133] FIG. 6 shows a method 600 for capturing and processing video demonstrations of task steps, in accordance with one embodiment of the present invention. The method 600 may start at step 602. At step 602, the task recorder 302 may capture a video demonstration of task steps. The task recorder 302 may initiate recording when the user 108 begins performing a task and may continue capturing visual content until the user 108 completes the demonstration. The video demonstration may capture screen content at a frame rate sufficient to record transitions between application states, menu selections, form field interactions, and other visual changes that occur during task execution. The video demonstration may serve as a visual reference that documents the complete workflow from initiation to completion.
[0134] The step 602 may encompass the task recorder 302 capturing a video demonstration of a user 108 performing an expense report submission task. The video demonstration may capture the user 108 opening an expense management application, selecting a new expense report form, entering expense details into various fields, attaching receipt images, and submitting the completed report for approval.
[0135] At step 604, the system 102 may monitor user input events, including mouse activity, voice commands, and keyboard inputs. Monitoring of user input events may occur concurrently with video capture, enabling the system to correlate visual content with the specific interactions performed by the user 108 at each moment during the task demonstration. The monitoring may capture multiple modalities of user input simultaneously, providing a comprehensive record of how the user 108 interacted with applications and interfaces during the workflow.
[0136] The system 102 may monitor mouse activity during step 604, including cursor position changes that track the movement of the mouse pointer across the screen, click events that record when and where the user 108 pressed mouse buttons, and scroll events that capture vertical or horizontal scrolling within application windows or web pages. The mouse activity monitoring may capture drag-and-drop operations where the user 108 clicks on an element, moves the cursor while holding the mouse button, and releases the button at a destination location.
[0137] The system 102 may monitor voice commands during step 604, including spoken instructions that the user 108 provides during the task demonstration. Voice commands may serve as annotations that describe the purpose of actions being performed, provide context for specific steps in the workflow, or indicate transitions between different phases of the task. Themonitoring of voice commands may enable the system 102 to capture natural language descriptions that supplement the visual and interaction data recorded during the demonstration.
[0138] The system 102 may monitor keyboard inputs during step 604, including individual key presses, text entry sequences, and keyboard shortcuts. The keyboard input monitoring may capture the specific characters typed by the user 108 when entering data into form fields, search boxes, or text editors. The monitoring may also capture keyboard shortcuts such as Ctrl+S for save operations, Ctrl+Z for undo operations, or Alt+Tab for switching between application windows.
[0139] The step 604 may encompass multimodal data integration where the system simultaneously captures video of a user 108 filling out an expense report form while monitoring mouse clicks on dropdown menus for selecting expense categories, keyboard inputs for entering expense amounts and descriptions, and voice commands where the user 108 states "attaching the receipt image for this expense item." The system may capture the user 108 navigating through different sections of the expense management application while monitoring mouse movements across navigation tabs, keyboard inputs for entering expense dates and vendor information, and voice annotations where the user 108 explains "this step adds the receipt documentation required for approval."
[0140] At step 606, the system 102 may generate a raw JSON instructions file from the captured data. The raw JSON instructions file may organize the captured video data and user input events into a structured format suitable for subsequent machine learning analysis. The raw JSON instructions file may include timestamps that correlate each captured event with the corresponding moment in the video recording, enabling precise alignment between visual content and user interactions.
[0141] The system 102 may include coordinate parameters in the raw JSON instructions file during step 606 that specify screen positions where mouse interactions occurred. The coordinate parameters may include x-axis and y-axis values that identify click locations, cursor positions during movement, and start and end points for drag operations. The raw JSON instructions file may include display resolution information and DPI parameters that enable translation of coordinates across different screen configurations.
[0142] The system 102 may include keyboard input sequences in the raw JSON instructions file during step 606 that record the characters and key combinations entered by the user 108 during the task demonstration. The raw JSON instructions file may include timing information that captures the intervals between consecutive keystrokes or between different types of userinput events. The raw JSON instructions file may also include transcribed voice command data where spoken instructions have been converted to text through speech recognition processing.
[0143] At step 608, the machine learning module 306 may analyze each frame in the recorded video. The machine learning module 306 may process individual frames from the video recording to extract visual information about screen content, application states, and user interface layouts. Frame-by-frame analysis may enable the machine learning module 306 to detect changes in screen content that correspond to user actions captured in the raw JSON instructions file.
[0144] The machine learning module 306 may extract visual features from each frame during step 608, including color patterns, edge boundaries, text regions, and graphical elements. The machine learning module 306 may compare consecutive frames to identify transitions that indicate navigation between screens, opening or closing of dialog boxes, or updates to displayed content resulting from user interactions. The frame analysis may generate a temporal map of screen states throughout the task demonstration.
[0145] The step 608 may encompass the machine learning module 306 analyzing frames from an expense report submission demonstration and identifying that frame 120 shows an empty expense form, frame 145 shows the form with a date field populated, frame 180 shows the form with an expense category selected from a dropdown menu, and frame 210 shows the form with an attached receipt image displayed. The frame analysis may identify that frames 250 through 260 capture a transition from the expense form screen to a confirmation dialog indicating successful submission.
[0146] At step 610, the machine learning module 306 may detect and recognize user interface elements using computer vision techniques. Computer vision techniques applied during step 610 may include object detection algorithms that identify the boundaries and locations of UI elements within each analyzed frame. The computer vision techniques may include classification algorithms that categorize detected elements according to their type, such as buttons, text fields, checkboxes, dropdown menus, links, or labels.
[0147] The machine learning module 306 may apply optical character recognition during step 610 to extract text content from labels, button captions, menu items, and other textual elements displayed on the screen. The machine learning module 306 may use OCR to identify the names of form fields, the text displayed on buttons that the user 108 clicked, and the content of dropdown menu options that the user 108 selected during the task demonstration.
[0148] The machine learning module 306 may generate bounding box coordinates for each detected UI element during step 610. The bounding box coordinates may define rectangularregions that encompass individual UI elements, enabling the digital worker 110 to locate and interact with corresponding elements when executing the task in different environments. The detection and recognition process may generate confidence scores that indicate the reliability of each element identification, enabling the system to flag elements that may require user validation.
[0149] The step 610 may encompass the machine learning module 306 detecting a button element in frame 210 of the expense report demonstration, recognizing the button text as "Submit Report" through OCR, and generating bounding box coordinates that define the button's location on the screen. The machine learning module 306 may detect a text input field, recognize an adjacent label as "Expense Amount" through OCR, and associate the field with the keyboard input captured at the corresponding timestamp in the raw JSON instructions file.
[0150] At step 612, the system 102 may generate a processed JSON instructions file containing step-by-step instructions adapted to the context of the specific task. The processed JSON instructions file may integrate the results of frame analysis, UI element detection, and correlation with user input events to produce executable instructions for the digital worker 110. The processed JSON instructions file may include contextual information derived from multimodal data integration, combining visual analysis results with keyboard input data, mouse interaction data, and transcribed voice command annotations.
[0151] The system 102 may include a frame number parameter in the processed JSON instructions file during step 612 for each instruction that identifies the video frame associated with the corresponding action. The processed JSON instructions file may include bounding box parameters that specify the location of target UI elements for each interaction step. The processed JSON instructions file may include click type parameters that indicate whether each mouse interaction was a left click, right click, double click, or other click variation.
[0152] The system 102 may include time parameters in the processed JSON instructions file during step 612 that capture the temporal aspects of each action, enabling the digital worker 110 to execute steps with appropriate pacing. The processed JSON instructions file may include text entry parameters that specify the characters to be typed at each keyboard input step, along with any associated keyboard shortcuts or special key combinations.
[0153] The step 612 may encompass multimodal data integration reflected in the processed JSON instructions file for the expense report submission task, including an instruction step that combines a frame number identifying the visual context, a bounding box locating an "Expense Amount" text field detected through computer vision, a click type indicating a left click to select the field, and text entry data specifying the expense amount captured from keyboardinput monitoring. The instruction step may also include a voice annotation transcription where the user 108 stated "entering the expense amount from the receipt," providing contextual information that enhances understanding of the step's purpose within the workflow. Additional instruction steps may include bounding boxes for the "Expense Category" dropdown menu, the "Attach Receipt" button, and the "Submit Report" button, each correlated with the corresponding user interactions captured during the demonstration.
[0154] FIG. 7 shows a method 700 for scheduling and executing tasks by a digital worker, in accordance with one embodiment of the present invention. The method 700 may start at step 702. At step 702, the user 108 may select a flow task from a dashboard. The dashboard may present available flow tasks that have been processed and validated through the flow editor 304. The user 108 may access the dashboard through the client device 104 and view a list of saved workflows that are ready for execution by the digital worker 110. The selection of a flow task may involve the user 108 identifying a specific workflow from the available options and designating that workflow for scheduling.
[0155] The method 700 may be illustrated through two distinct examples: Example 1 involving Weekly Sales Report Generation and Example 2 involving Daily Customer Data Synchronization. These examples demonstrate how the method 700 enables users to schedule and execute automated tasks through digital workers.
[0156] In Example 1, the user 108 may access the dashboard at step 702 and select a flow task named "Weekly Sales Report Generation" from a list of available workflows. The user 108 may review the flow task details displayed on the dashboard, including the number of steps in the workflow and the applications involved, before proceeding to configure scheduling parameters.
[0157] In Example 2, the user 108 may access the dashboard at step 702 and select a flow task named "Daily Customer Data Synchronization" that automates the transfer of customer information between a CRM application and a marketing platform.
[0158] At step 704, the user 108 may specify a start date and time for task execution. The specification of start date and time may enable the user 108 to define the exact moment at which the digital worker 110 should commence execution of the selected flow task. The user 108 may enter a calendar date and a clock time through input fields provided by the organizer module 308, establishing when the automated workflow should begin.
[0159] In Example 1, at step 704, the user 108 may specify a start date of the first Monday of the following month and a start time of 6:00 AM for the Weekly Sales Report Generation flowtask. The early morning start time may enable the digital worker 110 to complete the task before business hours begin, ensuring that the sales report is available when employees arrive.
[0160] In Example 2, at step 704, the user 108 may specify a start date of the current day and a start time of 5:00 PM for the Daily Customer Data Synchronization flow task, scheduling the task to execute after the close of business operations.
[0161] At step 706, the user 108 may configure repeat settings, including frequency and days. Configuration of repeat settings may enable the user 108 to establish recurring execution patterns for the selected flow task. The user 108 may specify the frequency of task execution, such as daily, weekly, bi-weekly, or monthly intervals, through options provided by the organizer module 308.
[0162] The user 108 may configure repeat settings during step 706 that include specification of the days of the week on which the task should be executed. The user 108 may select specific weekdays such as Monday, Wednesday, and Friday for a flow task that should run multiple times per week but not on weekends. The repeat settings may also include configuration of a date range that defines when the recurring schedule should begin and when the recurring schedule should end, ensuring that tasks are executed within a designated timeframe.
[0163] In Example 1, at step 706, the user 108 may configure repeat settings for the Weekly Sales Report Generation flow task. The user 108 may set the frequency to weekly, select Monday as the execution day, and leave the end date open to allow indefinite recurring execution until manually cancelled. The configuration may ensure that the sales report generation task executes every Monday throughout the operational period.
[0164] In Example 2, at step 706, the user 108 may configure repeat settings for the Daily Customer Data Synchronization flow task. The user 108 may set the frequency to daily, select all weekdays (Monday through Friday) as execution days, and define a date range spanning the entire fiscal year. The configuration may ensure that the customer data synchronization task executes every business day at 5:00 PM.
[0165] At step 708, the system 102 may assign a specific task to a digital worker. Assignment of the task may involve the user 108 selecting a digital worker 110 from available digital workers displayed through the user module and associating the scheduled flow task with the selected digital worker. The user 108 may choose a digital worker based on the digital worker's role, current workload status, or pre-trained capabilities that align with the requirements of the flow task.
[0166] In Example 1, at step 708, the user 108 may assign the Weekly Sales Report Generation flow task to a digital worker 110 designated as a Business Intelligence Analyst.The user 108 may select the digital worker from a list that displays the name, role, and current status of each available digital worker, confirming that the Business Intelligence Analyst is idle and available for the new assignment.
[0167] In Example 2, at step 708, the user 108 may assign the Daily Customer Data Synchronization flow task to a digital worker 110 designated as a Data Integration Specialist, matching the task type with a digital worker pre-trained for data synchronization workflows between CRM and marketing platforms.
[0168] At step 710, the player module 314 may execute the task of the digital worker at the scheduled time. The player module 314 may initiate execution of the assigned flow task when the scheduled start date and time arrive. The player module 314 may retrieve the processed instructions file associated with the flow task and direct the digital worker 110 to perform each step according to the instructions and conditions specified by the user 108.
[0169] The digital worker 110 may interact with applications and interfaces during step 710 as defined in the processed instructions file. The digital worker 110 may launch applications, navigate through menus, enter data into form fields, click buttons, and perform other interactions captured during the original video demonstration. The player module 314 may monitor the execution progress and handle any conditions specified by the user 108, such as the duration of the task or the number of repeating cycles.
[0170] In Example 1, at step 710, the player module 314 may initiate execution of the Weekly Sales Report Generation task at 6:00 AM on Monday as scheduled. The digital worker 110 may open a business intelligence application, query sales data from the previous week, generate a formatted report, and save the report to a designated shared folder.
[0171] In Example 2, at step 710, the player module 314 may initiate execution of the Daily Customer Data Synchronization task at 5:00 PM as scheduled. The digital worker 110 may connect to the CRM application, export updated customer records, transform the data according to the processed instructions, and import the data into the marketing platform.
[0172] At step 712, the reporting module 310 may log output results for evaluation. The reporting module 310 may receive information about the completed task execution and create an entry that documents the outcome. The reporting module 310 may record the task name, the outcome status indicating whether the task succeeded, failed, or was cancelled, the execution history capturing events throughout the task lifecycle, and the date of latest activity.
[0173] The reporting module 310 may enable the user 108 to review the performance of scheduled tasks during step 712 and identify any issues that occurred during execution. The reporting module 310 may store a screen-recorded video of the task execution, enabling theuser 108 to visually review the actions performed by the digital worker 110 and verify that the workflow completed as intended.
[0174] In Example 1, at step 712, the reporting module 310 may log an entry for the Weekly Sales Report Generation task with an outcome status of Succeeded, an execution history showing that the task initiated at 6:00 AM, completed data queries at 6:05 AM, generated the report at 6:08 AM, and saved the report at 6:09 AM. The user 108 may access the reporting module 310 later in the day to confirm that the report was generated and review the execution details.
[0175] In Example 2, at step 712, the reporting module 310 may log an entry for the Daily Customer Data Synchronization task with an outcome status of Failed, an execution history indicating that the task encountered an error when attempting to connect to the marketing platform, and a screen-recorded video that the user 108 may review to diagnose the connection issue.
[0176] The method 700 may support various scheduling configurations beyond the two examples described above. A configuration may include the user 108 scheduling a flow task to execute once at a specific date and time without recurrence, suitable for one-time data migration tasks or initial system setup workflows. Another scheduling configuration may include the user 108 scheduling a flow task to execute on specific days of the week, such as every Tuesday and Thursday at 9:00 AM, suitable for tasks that align with bi-weekly meeting schedules or periodic reporting requirements. The user 108 may configure a flow task to execute monthly on a specific day, such as the first business day of each month, suitable for tasks that generate monthly financial statements or process monthly subscription renewals.
[0177] The method 700 may support various execution scenarios. A scenario may include a digital worker 110 executing a scheduled flow task that processes incoming purchase orders from an email inbox. The digital worker 110 may download attached order documents, extract order details using the steps captured in the processed instructions file, enter the order information into an enterprise resource planning (ERP) system, and generate confirmation emails to customers. Another execution scenario may include a digital worker 110 executing a scheduled flow task that monitors a support ticket queue, identifies tickets that have been open beyond a threshold duration, escalates those tickets to a supervisor, and updates the ticket status in the support system.
[0178] The system 102 may employ RESTful APIs to communicate between the client devices 104 and a cloud-based system 102 for efficient, scalable task processing. RESTful APIs may provide a standardized communication protocol that enables the client devices 104to transmit video recordings, user input events, and task configuration data to the system 102 for processing. The RESTful API communication may enable stateless interactions between the client devices 104 and the system 102, where each request from a client device 104 contains all information necessary for the system 102 to process the request without relying on stored session state.
[0179] The RESTful API architecture may support scalable task processing by enabling the system 102 to handle multiple concurrent requests from different client devices 104. The RESTful API endpoints may include endpoints for uploading video recordings captured by the task recorder, endpoints for retrieving processing status of raw instructions files, endpoints for submitting validated processed instructions files for task assignment, and endpoints for querying execution results from the reporting module.
[0180] The system 102 may process and deliver instructions files to the digital worker 110 through a GraphQL interface. The GraphQL interface may enable precise data retrieval by allowing the digital worker 110 to request specific fields and data structures from processed instructions files without receiving extraneous information. The GraphQL interface may provide a flexible query language that enables the digital worker 110 to retrieve instructions adapted to their execution context. In some cases, the digital worker 110 may query the GraphQL interface to retrieve instructions for a specific task assignment, including the sequence of steps, the UI element identifiers, and the conditions specified by the user 108. The GraphQL interface may support nested queries that enable retrieval of related data structures, such as retrieving a processed instructions file along with associated metadata about the original video recording and the user 108 who created the task demonstration.
[0181] The GraphQL interface may enable efficient task execution by providing the digital worker 110 with structured responses that match the schema of processed instructions files. In some cases, the GraphQL interface may support subscriptions that enable the digital worker 110 to receive real-time updates when new tasks are assigned or when existing task configurations are modified.
[0182] The system 102 may employ an Enterprise Service Bus (ESB) to facilitate integration across multiple enterprise applications. The ESB may provide a middleware layer that enables consistent task execution across different platforms and systems within an organization. The ESB integration may enable the digital worker 110 to interact with diverse enterprise applications through standardized connectors and adapters. The ESB may support message transformation, protocol conversion, and routing logic that directs task-related data to appropriate destination systems based on workflow requirements. The ESB may ensureconsistent task execution across platforms by providing transaction management and error handling capabilities. The ESB may maintain message queues that buffer task-related communications during periods of high load or when destination systems are temporarily unavailable, ensuring reliable delivery of instructions and output data.
[0183] The system 102 may configure the digital worker 110 for autonomous lead and candidate generation through intent -based matching. In some cases, the digital worker 110 may define and continuously refine an ideal lead or candidate profile based on parameters that characterize desirable prospects for an organization. The parameters used to define the ideal profile may include skills possessed by the prospect, experience level and history, behavioral characteristics, engagement patterns observed across digital channels, industry relevance, and expressed user intent derived from interactions with organizational content or communications.
[0184] The digital worker 110 may define skill parameters within the ideal lead or candidate profile by specifying technical competencies, certifications, educational qualifications, or domain expertise that align with organizational requirements. In some cases, the skill parameters may include programming languages, software proficiencies, industry-specific knowledge areas, or professional credentials that indicate a prospect's suitability for a particular role or offering. The digital worker 110 may refine skill parameters over time based on feedback from successful conversions or placements, adjusting the weighting or specificity of skill requirements to improve matching accuracy.
[0185] The digital worker 110 may define experience parameters within the ideal profile by specifying years of relevant work history, types of organizations where the prospect has worked, job titles held, or projects completed. In some cases, the experience parameters may include industry tenure, leadership experience, or exposure to specific business environments that correlate with successful outcomes. The digital worker 110 may analyze patterns in historical data to identify experience characteristics that distinguish high-quality prospects from lower-quality prospects.
[0186] The digital worker 110 may define behavioral parameters within the ideal profile by specifying actions or interaction patterns that indicate prospect interest or readiness. Behavioral parameters may include frequency of website visits, content consumption patterns, response rates to communications, participation in webinars or events, or engagement with product demonstrations. In some cases, the digital worker 110 may track behavioral signals across multiple touchpoints to construct a comprehensive view of prospect engagement levels.
[0187] The digital worker 110 may define engagement pattern parameters by analyzing the timing, frequency, and depth of prospect interactions with organizational content andcommunications. Engagement patterns may include email open rates, click-through rates on links, time spent on specific web pages, social media interactions, or responses to outreach messages. In some cases, the digital worker 110 may identify engagement patterns that correlate with conversion likelihood, enabling prioritization of prospects exhibiting similar patterns.
[0188] The digital worker 110 may define industry relevance parameters by specifying sectors, market segments, or business categories that align with organizational target markets. In some cases, the industry relevance parameters may include company size, geographic location, revenue range, or growth stage that indicate fit with organizational offerings or hiring needs. The digital worker 110 may refine industry relevance parameters based on analysis of successful conversions or placements within specific market segments.
[0189] The digital worker 110 may define expressed user intent parameters by analyzing explicit signals that indicate prospect interest or readiness to engage. Expressed user intent may be derived from form submissions where prospects request information, demo requests, content downloads, inquiry messages, or direct expressions of interest through communication channels. In some cases, the digital worker 110 may analyze the language and context of prospect communications to infer intent levels and categorize prospects according to their position in the engagement funnel.
[0190] The digital worker 110 may autonomously collect data from multiple digital sources to identify prospects that match the defined ideal profile. Digital sources from which the digital worker 110 may collect data include websites, social platforms, databases, communication channels, and form submissions. In some cases, the digital worker 110 may access publicly available information from professional networking sites, company websites, industry directories, or social media profiles to gather data about potential leads or candidates.
[0191] The digital worker 110 may collect data from websites by navigating to specified URLs, extracting relevant information from web pages, and storing the extracted data for subsequent analysis. In some cases, the digital worker 110 may monitor websites for updates or new content that indicates prospect activity or availability. The digital worker 110 may extract contact information, professional backgrounds, company affiliations, or other relevant data points from website content.
[0192] The digital worker 110 may collect data from social platforms by accessing profiles, posts, connections, or activity feeds that provide information about potential prospects. In some cases, the digital worker 110 may analyze social platform data to identify prospects who have expressed interest in relevant topics, engaged with industry content, or demonstrated expertisein areas aligned with organizational requirements. The digital worker 110 may track social platform engagement metrics to assess prospect activity levels and interest indicators.
[0193] The digital worker 110 may collect data from databases by querying internal or external data repositories that contain prospect information. In some cases, the digital worker 110 may access customer relationship management databases, applicant tracking system databases, marketing automation databases, or third-party data providers to retrieve prospect records. The digital worker 110 may integrate data from multiple database sources to construct comprehensive prospect profiles.
[0194] The digital worker 110 may collect data from communication channels by monitoring email inboxes, chat platforms, messaging applications, or other communication systems for prospect interactions. In some cases, the digital worker 110 may extract information from incoming messages, including contact details, inquiry content, and expressed interests. The digital worker 110 may analyze communication patterns to identify prospects who have initiated contact or responded to outreach efforts.
[0195] The digital worker 110 may collect data from form submissions by processing entries submitted through web forms, registration pages, contact forms, or application portals. In some cases, the digital worker 110 may extract structured data from form fields, including names, email addresses, phone numbers, company names, job titles, and responses to qualification questions. The digital worker 110 may correlate form submission data with other collected data to enrich prospect profiles.
[0196] The digital worker 110 may process collected data by cleaning, normalizing, and structuring the information for analysis. Processing may include removing duplicate records, standardizing data formats, validating contact information, and resolving inconsistencies across data sources. In some cases, the digital worker 110 may apply data transformation rules to convert raw collected data into a consistent format suitable for matching against the ideal profile parameters.
[0197] The machine learning module 306 may analyze collected data to identify prospects that exhibit strong alignment with the defined ideal profile. Analysis may include comparison of prospect attributes against profile parameters, calculation of similarity scores, and identification of prospects that meet threshold criteria. In some cases, the machine learning module 306 may apply machine learning algorithms to analyze complex patterns in prospect data and identify non-obvious correlations that indicate prospect quality.
[0198] The digital worker 110 may rank prospects based on the degree of alignment between prospect attributes and the ideal profile parameters. Ranking may assign numerical scores toeach prospect that reflect the strength of the match across multiple dimensions. In some cases, the digital worker 110 may weight different profile parameters according to their relative importance, enabling customization of the ranking algorithm to reflect organizational priorities.
[0199] The digital worker 110 may score prospects based on real-time behavioral signals observed during the prospect identification and engagement process. Real-time behavioral signals may include recent website visits, content downloads, email opens, link clicks, or social media interactions that indicate current prospect interest levels. In some cases, the digital worker 110 may adjust prospect scores dynamically as new behavioral signals are observed, enabling prioritization of prospects exhibiting active engagement.
[0200] The digital worker 110 may score prospects based on historical interaction data that captures past engagement with organizational content and communications. Historical interaction data may include records of previous communications, past purchases or applications, event attendance, or prior expressions of interest. In some cases, the digital worker 110 may analyze historical interaction patterns to identify prospects with demonstrated long-term interest or recurring engagement that indicates sustained relevance.
[0201] The digital worker 110 may filter prospects based on qualification criteria that determine eligibility for further engagement. Filtering may exclude prospects that do not meet minimum threshold requirements for specific parameters, such as geographic location, company size, or experience level. In some cases, the digital worker 110 may apply exclusion rules that remove prospects matching disqualification criteria, such as competitors, existing customers, or prospects who have opted out of communications.
[0202] The digital worker 110 may qualify prospects by evaluating whether prospects meet predefined criteria that indicate readiness for engagement with sales representatives, recruiters, or other human operators. Qualification may involve assessment of budget authority, decisionmaking role, timeline for action, or other factors that indicate prospect readiness. In some cases, the digital worker 110 may apply qualification frameworks such as BANT (Budget, Authority, Need, Timeline) or similar methodologies to categorize prospects according to their qualification status.
[0203] The digital worker 110 may automate outreach to prospects by generating and sending personalized communications through email, messaging platforms, or other communication channels. Automated outreach may include initial contact messages that introduce organizational offerings or opportunities, follow-up messages that provide additional information or address prospect questions, and nurturing messages that maintain engagementover time. In some cases, the digital worker 110 may personalize outreach messages based on prospect profile data, incorporating references to prospect skills, experience, industry, or expressed interests.
[0204] The digital worker 110 may automate follow-up communications by tracking prospect responses and triggering subsequent messages based on prospect actions or inaction. Followup automation may include reminder messages sent to prospects who have not responded within a specified timeframe, thank-you messages sent after prospect engagement, or escalation messages that offer additional resources or incentives. In some cases, the digital worker 110 may adjust follow-up timing and content based on prospect engagement patterns observed during the outreach sequence.
[0205] The digital worker 110 may automate status tracking by maintaining records of prospect engagement status throughout the lead generation or candidate identification process. Status tracking may include categorization of prospects according to their position in the engagement funnel, such as new prospect, contacted, engaged, qualified, or converted. In some cases, the digital worker 110 may update prospect status automatically based on observed actions, such as advancing a prospect to engaged status upon receipt of a response or advancing a prospect to qualified status upon completion of a qualification assessment.
[0206] The digital worker 110 may automate handoff of qualified prospects to human operators for further engagement. Handoff may involve notification of sales representatives, recruiters, or account managers that a qualified prospect is ready for personal contact. In some cases, the digital worker 110 may provide human operators with prospect profile summaries, engagement history, and recommended talking points to facilitate effective follow-up conversations.
[0207] The digital worker 110 may automate handoff to downstream systems such as customer relationship management (CRM) platforms. Handoff to CRM platforms may involve creation of new contact records, lead records, or opportunity records that capture prospect information and engagement history. In some cases, the digital worker 110 may update existing CRM records with new data collected during the lead generation process, ensuring that CRM systems maintain current and comprehensive prospect information.
[0208] The digital worker 110 may automate handoff to applicant tracking systems (ATS) for candidate generation applications. Handoff to ATS platforms may involve creation of candidate profiles, application records, or pipeline entries that capture candidate qualifications and engagement status. In some cases, the digital worker 110 may attach supporting documents such as resumes, portfolios, or assessment results to candidate records in the ATS.
[0209] The digital worker 110 may automate handoff to sales automation platforms that manage outreach sequences, pipeline tracking, or revenue forecasting. Handoff to sales automation platforms may involve enrollment of qualified leads in automated nurturing sequences, assignment of leads to sales territories or representatives, or creation of pipeline entries that track lead progression toward conversion. In some cases, the digital worker 110 may synchronize data between multiple downstream systems to ensure consistency across CRM, ATS, and sales automation platforms. The reporting module 310 may log prospect engagement activities and handoff events, enabling the user 108 to monitor the lead generation pipeline and evaluate conversion metrics through the statistics module 312.
[0210] The system 102 may include predictive task failure detection mechanisms that utilize machine learning algorithms to analyze historical data, system performance metrics, and task execution patterns. The predictive task failure detection mechanisms may process past task execution data to identify trends and conditions that have historically led to task failure. In some cases, the machine learning algorithms may examine records of previous task executions stored in the reporting module 310, extracting features such as execution duration, error occurrences, resource utilization levels, and environmental conditions present during each execution. The analysis of historical data may enable the system 102 to recognize correlations between specific conditions and subsequent task failures, allowing the system 102 to proactively identify early signs of potential failure in current or scheduled task executions.
[0211] The predictive task failure detection mechanisms may analyze system performance metrics to assess the operational state of components involved in task execution. System performance metrics may include processor utilization, memory availability, network latency, application response times, and storage capacity. In some cases, the machine learning algorithms may monitor these metrics in real time during task execution and compare observed values against baseline measurements derived from historical data. Deviations from baseline performance metrics may indicate conditions that could lead to task failure, such as resource exhaustion, network connectivity issues, or application instability.
[0212] The predictive task failure detection mechanisms may analyze task execution patterns to identify sequences of events or conditions that precede task failures. Task execution patterns may include the order of steps performed during workflow execution, the timing relationships between consecutive actions, and the interactions between the digital worker 110 and target applications. In some cases, the machine learning algorithms may construct models that represent normal task execution patterns and detect deviations from these patterns that mayindicate impending failure. The analysis of task execution patterns may enable the system 102 to predict potential failures before the failures manifest as errors or workflow interruptions.
[0213] The system 102 may employ pattern recognition systems to detect anomalies or unusual behavior in task execution processes. The pattern recognition systems may identify specific patterns or sequences of events that are indicative of an impending failure. In some cases, the pattern recognition systems may utilize statistical methods, clustering algorithms, or neural network architectures to distinguish between normal task execution behavior and anomalous behavior that warrants attention.
[0214] The pattern recognition systems may detect deviations from normal processing times during task execution. Normal processing times may be established based on historical execution data, representing the typical duration required for each step or for the complete workflow. In some cases, the pattern recognition systems may flag task executions where individual steps or overall workflow duration exceed expected thresholds by a configurable margin. Deviations from normal processing times may indicate issues such as application slowdowns, resource contention, or unexpected changes in the target environment that could lead to task failure.
[0215] The pattern recognition systems may detect increases in error rates during task execution. Error rates may be calculated based on the frequency of errors encountered during recent task executions compared to historical error frequencies. In some cases, the pattern recognition systems may track error occurrences across multiple task executions and identify trends indicating that error rates are increasing over time. An increase in error rates may signal degradation in application stability, changes in target system configurations, or other conditions that increase the likelihood of task failure.
[0216] The pattern recognition systems may detect inconsistency in task performance across multiple executions. Task performance inconsistency may manifest as variability in execution outcomes, fluctuations in processing times, or intermittent errors that occur unpredictably across different executions of the same workflow. In some cases, the pattern recognition systems may analyze variance in performance metrics across recent task executions and identify workflows exhibiting higher than expected variability. Inconsistency in task performance may indicate environmental instability, race conditions, or other factors that could result in task failure under certain circumstances.
[0217] The system 102 may provide early warning indicators that inform the user 108 of potential issues detected by the predictive task failure detection mechanisms and pattern recognition systems. The early warning indicators may take the form of notifications deliveredthrough communication channels configured by the user 108, such as email messages, mobile push notifications, or in-application alerts displayed on the dashboard. In some cases, the early warning indicators may be presented as system prompts that appear when the user 108 accesses the system 102, drawing attention to potential issues that require review or action.
[0218] The early warning indicators may include information about the nature of the detected anomaly or predicted failure condition. In some cases, the early warning indicators may specify which task or workflow is affected, the type of anomaly detected, the severity level of the potential issue, and recommended actions that the user 108 may take to address the situation. The early warning indicators may enable the user 108 to review potential issues and make informed decisions about whether to intervene manually or allow automated corrective actions to proceed.
[0219] The system 102 may trigger automated corrective actions in response to early warning indicators. Automated corrective actions may include adjustments to task execution parameters, such as modifying timing intervals, reducing batch sizes, or adjusting retry settings to accommodate detected conditions. In some cases, automated corrective actions may include rerouting of tasks to alternative digital workers or execution environments that are not affected by the detected anomaly. The rerouting of tasks may enable continued workflow execution when the originally assigned digital worker 110 or target system exhibits conditions that could lead to failure.
[0220] Automated corrective actions may include allocation of additional resources to support task execution. In some cases, the system 102 may provision additional processing capacity, allocate additional memory, or establish additional network connections to address resource constraints detected by the predictive failure detection mechanisms. The allocation of additional resources may enable task execution to proceed without interruption when resource availability issues are identified as potential causes of failure.
[0221] The automated corrective actions may be configured by the user 108 or administrators to define the types of interventions that the system 102 may perform automatically and the conditions under which manual approval is required before corrective actions are executed. In some cases, the system 102 may log all automated corrective actions in the reporting module 310, enabling the user 108 to review the interventions performed and assess the effectiveness of the predictive failure detection mechanisms in preventing task failures.
[0222] The system 102 may be applied in healthcare to automate telemedicine appointment scheduling and patient records management. The digital worker 110 may execute workflows that access telemedicine platforms, match patient scheduling requests with availableappointment slots, and confirm appointments through patient communication channels. The task recorder 302 may capture video demonstrations of healthcare administrators performing patient records management tasks, and the machine learning module 306 may process these recordings to generate processed instructions files that enable the digital worker 110 to navigate electronic health record systems and update patient information in compliance with healthcare data standards. The reporting module 310 may log execution results for healthcare compliance auditing purposes.
[0223] The system 102 may be applied in logistics to enhance fleet tracking, route optimization, and supply chain automation. The digital worker 110 may execute workflows that access fleet management platforms, update fleet status dashboards, and generate reports on vehicle utilization. The organizer module 308 may schedule route optimization tasks to execute at specified intervals, and the player module 314 may initiate execution of these tasks according to the configured schedule. The machine learning module 306 may process video demonstrations of logistics coordinators performing supply chain tasks, enabling the digital worker 110 to monitor inventory levels and generate purchase orders when stock falls below threshold levels. The statistics module 312 may track cumulative hours worked and total time saved through automated logistics operations.
[0224] The system 102 may be applied in finance to improve loan processing, fraud detection, and compliance reporting. The task recorder 302 may capture video demonstrations of loan officers performing application processing workflows, and the machine learning module 306 may generate processed instructions files that enable the digital worker 110 to retrieve applicant information, access credit reporting systems, and route completed applications to underwriting queues. The digital worker 110 may perform fraud detection tasks by monitoring transaction data streams and generating alerts for suspicious transactions. The flow editor 304 may enable compliance officers to review and modify fraud detection workflows, and the reporting module 310 may log compliance reporting activities for regulatory audit purposes.
[0225] The system 102 may be applied in accounting to improve bookkeeping and management of invoices, bills, and receipts. The task recorder 302 may capture video demonstrations of accountants performing bookkeeping tasks, and the machine learning module 306 may process these recordings to generate instructions that enable the digital worker 110 to access accounting software, categorize transactions, and reconcile account balances. The organizer module 308 may schedule invoice management tasks, and the player module 314 may execute these tasks according to vendor payment terms. The digital worker 110 may perform receipt management tasks by processing expense receipts and posting expense entriesto appropriate general ledger accounts, with the reporting module 310 logging all accounting operations for audit trails.
[0226] The system 102 may be applied for tax filing and business filings by filling data from various sources and documents. The task recorder 302 may capture video demonstrations of tax professionals preparing tax returns, and the machine learning module 306 may generate processed instructions files that enable the digital worker 110 to retrieve financial data from accounting systems and populate tax form fields with calculated values. The flow editor 304 may enable tax professionals to review and validate generated instructions before the organizer module 308 schedules filing tasks. The digital worker 110 may prepare business filing submission packages, and the reporting module 310 may log filing activities for compliance documentation.
[0227] The system 102 may be applied in HR for managing employee records and payroll calculations. The task recorder 302 may capture video demonstrations of HR specialists performing employee records management workflows, and the machine learning module 306 may process these recordings to generate instructions for the digital worker 110. The digital worker 110 may update employee demographic data and generate employee documentation according to the processed instructions files. The organizer module 308 may schedule payroll calculation tasks, and the player module 314 may execute these tasks by directing the digital worker 110 to retrieve time and attendance data, apply pay rates and deduction schedules, and generate payroll registers. The statistics module 312 may track total time saved through automated HR operations.
[0228] The system 102 may be applied for recruitment by automating candidate profile collection, organization, initial communication, and sending personalized messages based on profile and job description match. The task recorder 302 may capture video demonstrations of recruiters performing candidate sourcing tasks, and the machine learning module 306 may generate processed instructions files that enable the digital worker 110 to access job boards and applicant tracking systems to collect candidate profiles. The flow editor 304 may enable recruiters to customize outreach message templates, and the organizer module 308 may schedule candidate communication tasks. The digital worker 110 may organize collected candidate information and generate customized outreach messages that reference specific candidate qualifications relevant to the opportunity, with the reporting module 310 logging all recruitment activities.
[0229] The system 102 may be applied in Aerospace and Defense for mission control automation with real-time telemetry processing, aerospace inventory management, threatmonitoring systems, and drone fleet management. The task recorder 302 may capture video demonstrations of mission control operators performing telemetry monitoring tasks, and the machine learning module 306 may process these recordings to generate instructions for the digital worker 110. The digital worker 110 may execute mission control workflows that process telemetry data and generate alerts when readings fall outside acceptable ranges. The organizer module 308 may schedule aerospace inventory management tasks, and the player module 314 may coordinate mission planning by directing the digital worker 110 to match available aircraft with mission requirements. The reporting module 310 may log all mission-critical operations for security and compliance purposes.
[0230] The system 102 may be applied in marketing to enhance marketing analysis by automating data collection and inserting data into relevant analysis tools. The task recorder 302 may capture video demonstrations of marketing analysts performing data collection workflows, and the machine learning module 306 may generate processed instructions files that enable the digital worker 110 to access web analytics platforms, social media management tools, and advertising platforms to retrieve performance metrics. The organizer module 308 may schedule recurring data collection tasks, and the player module 314 may execute these tasks at configured intervals. The digital worker 110 may format collected marketing data and import the data into business intelligence platforms, with the statistics module 312 tracking the total number of tasks completed and time saved through automated marketing operations.
[0231] The system 102 may be applied for training guide generation where the digital worker 110 autonomously generates and updates training materials. The task recorder 302 may capture video demonstrations of training coordinators performing documentation update workflows, and the machine learning module 306 may process these recordings to generate instructions for the digital worker 110. The digital worker 110 may execute workflows that access existing training documentation, identify sections requiring updates, and generate revised content reflecting current practices. The organizer module 308 may schedule monitoring tasks, and the player module 314 may direct the digital worker 110 to monitor source systems for changes affecting documented procedures and automatically update corresponding sections of training materials. The reporting module 310 may log all documentation updates for version control purposes.
[0232] The system 102 may be applied for microscopy analysis to measure fluorescence intensity and levels of biological enzymes in cell samples. The task recorder 302 may capture video demonstrations of laboratory technicians performing image analysis workflows, and the machine learning module 306 may generate processed instructions files that enable the digitalworker 110 to access microscopy imaging systems, retrieve captured images, and apply image analysis algorithms to quantify fluorescence intensity values. The organizer module 308 may schedule batch analysis tasks, and the player module 314 may execute these tasks by directing the digital worker 110 to process large volumes of sample data and aggregate results across multiple samples. The reporting module 310 may log analysis results, and the statistics module 312 may track cumulative hours worked on microscopy analysis tasks.
[0233] The system 102 may be applied for bus service coordination automation by interfacing with communication tools and spreadsheet applications for driver management. The task recorder 302 may capture video demonstrations of transportation coordinators performing driver scheduling tasks, and the machine learning module 306 may process these recordings to generate instructions for the digital worker 110. The digital worker 110 may execute workflows that access communication platforms to exchange messages with bus drivers regarding schedule assignments. The flow editor 304 may enable coordinators to modify driver communication workflows, and the organizer module 308 may schedule driver management tasks. The player module 314 may direct the digital worker 110 to update driver schedules maintained in spreadsheet applications and initiate contact with replacement drivers when assigned drivers are unavailable, with the reporting module 310 logging all coordination activities.
[0234] The system 102 may be applied for workflow efficiency evaluation to compare and evaluate different workflows performing the same task. The task recorder 302 may capture multiple video demonstrations of users performing the same task using different approaches, and the machine learning module 306 may generate separate processed instructions files for each workflow variant. The digital worker 110 may execute analysis workflows that retrieve execution data for multiple workflow variants from the reporting module 310 and generate comparative assessments highlighting differences in efficiency. The statistics module 312 may provide performance metrics including total time saved and cumulative hours worked for each workflow variant, enabling the digital worker 110 to analyze cost factors, time consumption, and effort requirements to provide data-driven insights for workflow optimization.
[0235] The system 102 may support deployment on on-premises infrastructure or private cloud environments to ensure data privacy control. On-premises deployment may enable organizations to maintain full control over data by keeping data securely on machines owned and operated by the organization, with the task recorder 302, machine learning module 306, flow editor 304, organizer module 308, player module 314, reporting module 310, and statistics module 312 all operating within organizational network boundaries. In some cases, on-premises deployment may address regulatory requirements that mandate data residency within specific geographic boundaries or within organizational network perimeters. Private cloud deployment may provide similar data control benefits while offering cloud infrastructure advantages such as virtualization, resource pooling, and centralized management within a dedicated environment isolated from public cloud tenants.
[0236] The system 102 may ensure compliance with data privacy laws including the Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS), and the General Data Protection Regulation (GDPR). HIPAA compliance may be achieved through deployment configurations that protect electronic protected health information (ePHI) by maintaining data within secure organizational boundaries, implementing access controls, and supporting audit logging requirements through the reporting module 310. In some cases, healthcare organizations may deploy the system 102 on-premises to process patient scheduling workflows, medical records management tasks, and other healthcare administration functions while maintaining HIPAA compliance, with the digital worker 110 executing these tasks according to processed instructions files generated by the machine learning module 306.
[0237] PCI DSS compliance may be achieved through deployment configurations that protect cardholder data by isolating payment processing workflows within secure network segments, encrypting sensitive data at rest and in transit, and restricting access to payment card information to authorized personnel and systems. In some cases, financial services organizations and retail enterprises may deploy the system 102 on-premises or in private cloud environments to automate payment processing tasks, transaction reconciliation workflows, and financial reporting functions while maintaining PCI DSS compliance, with the task recorder 302 capturing demonstrations within secure environments and the digital worker 110 executing tasks according to instructions validated through the flow editor 304.
[0238] GDPR compliance may be achieved through deployment configurations that protect personal data of European Union residents by enabling data subject rights management, supporting data minimization practices, and maintaining records of processing activities through the reporting module 310.
[0239] The system 102 may support dual deployment offering both cloud and on-premises capabilities for scalability and security. Dual deployment may enable organizations to select deployment models that align with specific workload requirements, regulatory constraints, and operational preferences. In some cases, organizations may deploy certain workflows in cloud environments to leverage elastic scalability and reduced infrastructure management overheadwhile deploying other workflows on-premises to satisfy data sensitivity or compliance requirements, with the task recorder 302, machine learning module 306, flow editor 304, organizer module 308, player module 314, reporting module 310, statistics module 312, and digital worker 110 operating across both deployment environments.
[0240] Cloud deployment may provide scalability advantages by enabling organizations to scale processing capacity dynamically in response to varying workload demands. In some cases, cloud deployment may enable organizations to accommodate seasonal peaks in task volume, support rapid expansion of automation initiatives, or provision additional digital worker 110 instances without procuring and configuring physical infrastructure. Cloud deployment may also provide geographic distribution capabilities that enable task execution by the digital worker 110 across multiple regions to support global operations, with the organizer module 308 coordinating schedules across time zones and the reporting module 310 aggregating execution results from distributed deployments.
[0241] On-premises deployment may provide security advantages by enabling organizations to maintain data within controlled network perimeters, apply organizational security policies to automation infrastructure, and integrate with existing security monitoring and incident response systems. In some cases, on-premises deployment may enable organizations to satisfy security requirements imposed by customers, partners, or regulatory bodies that mandate data processing within specific controlled environments, with the task recorder 302 capturing video demonstrations locally, the machine learning module 306 processing recordings within organizational boundaries, and the digital worker 110 executing tasks without transmitting sensitive data to external systems.
[0242] The system 102 may support OEM integration with enterprise platforms including Microsoft Azure, Amazon Web Services (AWS), and Salesforce, enabling automation capabilities as embedded components within existing technology ecosystems. Microsoft Azure integration may enable deployment of the system 102 within Azure cloud infrastructure, leveraging Azure compute, storage, and networking capabilities, and interacting with Azure - native services including Azure Active Directory, Azure Key Vault, and Azure Monitor, with the digital worker 110 executing tasks within Azure environments. AWS integration may enable deployment of the system 102 within AWS cloud infrastructure, leveraging EC2, S3, VPC, and interacting with AWS IAM, CloudWatch, and Lambda services. Salesforce integration may enable the digital worker 110 to automate workflows interacting with Salesforce CRM, Sales Cloud, and Service Cloud, including lead data entry, opportunity updates, case management, and report generation, with the task recorder 302 capturingdemonstrations of Salesforce interactions and the machine learning module 306 generating corresponding processed instructions files.
[0243] The system 102 may support a Software-as-a-Service (SaaS) subscription model with flexible monthly or annual plans tailored to small and medium enterprises, enabling access to automation capabilities including the task recorder 302, machine learning module 306, flow editor 304, organizer module 308, player module 314, reporting module 310, statistics module 312, and digital worker 110 without capital infrastructure expenditure. Monthly subscription plans may provide flexibility for organizations evaluating automation or addressing temporary requirements, while annual plans may offer cost advantages through discounted rates. The SaaS subscription model may include tiered pricing structures aligning costs with organizational size, usage volume, or feature requirements.
[0244] The system 102 may support exclusive and non-exclusive licensing agreements for large-scale enterprise customers with customized terms regarding pricing, deployment configurations, and support arrangements. Exclusive licensing agreements may grant exclusive deployment rights within specified markets, industries, or geographic regions, potentially including co-development or customization provisions for the task recorder 302, machine learning module 306, flow editor 304, organizer module 308, player module 314, reporting module 310, statistics module 312, and digital worker 110. Non-exclusive licensing agreements may grant deployment rights under customized terms while permitting the system provider to serve multiple enterprise customers, potentially including volume-based pricing or multi-year commitments.
[0245] The system may provide advantages including elimination of programming expertise requirements for automation implementation. The elimination of programming expertise requirements may enable non-technical users such as business analysts, operations managers, and administrative professionals to create and deploy automated workflows without writing code or learning specialized programming languages. In some cases, users may create automation workflows by recording video demonstrations of tasks, enabling the system to generate executable instructions through machine learning analysis rather than manual script development.
[0246] The system may provide no-code automation through video demonstrations that capture task execution steps and user interactions. No-code automation may enable users to demonstrate tasks through natural performance rather than abstracting task steps into programming constructs or flowchart representations. In some cases, the video demonstration approach may reduce the time required to create automation workflows compared to traditionalprogramming or RPA flow editor approaches, enabling rapid deployment of automation for new tasks.
[0247] The system may provide reduced integration complexity by utilizing machine learning to interpret video demonstrations and generate context-aware instructions. Reduced integration complexity may result from the system's ability to recognize user interface elements, interpret screen content, and adapt instructions to different environments without requiring custom integration code for each target application. In some cases, the reduced integration complexity may enable automation of tasks spanning multiple applications without developing application-specific connectors or adapters.
[0248] The system may provide real-time Al-powered automation that enables tasks to be executed with high performance and minimal human intervention. Real-time automation may enable digital workers to respond to task assignments, process instructions, and execute workflows without delays associated with batch processing or manual initiation. In some cases, Al-powered automation may enable digital workers to adapt to variations in target application interfaces, handle unexpected conditions, and make contextually appropriate decisions during task execution.
[0249] The system may provide cross-environment adaptability that enables digital workers to execute tasks across different environments with varying user interface configurations. Cross-environment adaptability may result from the machine learning module's generation of instructions that reference user interface elements by their visual characteristics and contextual relationships rather than fixed screen coordinates or application-specific identifiers. In some cases, cross-environment adaptability may enable a workflow created through video demonstration on one machine to execute on different machines with different screen resolutions, display scaling settings, or application versions.
[0250] The system may provide data security through on-premises processing that enables organizations to maintain control over sensitive data throughout the automation lifecycle. Data security through on-premises processing may address concerns about data exposure that arise when automation systems transmit data to external cloud services for processing. In some cases, on-premises processing may enable organizations to automate tasks involving confidential business information, personal data, financial records, or other sensitive content while maintaining data within organizational security boundaries. The on-premises processing capability may enable organizations to satisfy data handling requirements imposed by customers, partners, or regulatory frameworks that restrict data transmission to external systems.
[0251] Machine readable storage including machine-readable instructions, when executed, to implement a method or realize an apparatus in any of the examples of the present application.
[0252] Various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD- ROMs, hard drives, a non-transitory computer readable storage medium, or any other machine- readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. In the case of program code execution on programmable computers, the computing device may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and / or storage elements), at least one input device, and at least one output device. The volatile and non-volatile memory and / or storage elements may be a RAM, an EPROM, a flash drive, an optical drive, a magnetic hard drive, or another medium for storing electronic data. One or more programs that may implement or utilize the various techniques described herein may use an application programming interface (API), reusable controls, and the like. Such programs may be implemented in a high-level procedural or an object-oriented programming language to communicate with a computer system. However, the program(s) may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or an interpreted language, and combined with hardware implementations.
[0253] It should be understood that many of the functional units described in this specification may be implemented as one or more components, which is a term used to more particularly emphasize their implementation independence. For example, a component may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A component may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like.
[0254] Components may also be implemented in software for execution by various types of processors. An identified component of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, a procedure, or a function. Nevertheless, the executables of an identified component need not be physically located together, but may comprise disparate instructions stored in different locations that, when joined logically together, comprise the component and achieve the stated purpose for the component.
[0255] Indeed, a component of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within components, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network. The components may be passive or active, including agents operable to perform desired functions.
[0256] Reference throughout this specification to “an example” means that a particular feature, structure, or characteristic described in connection with the example is included in at least one embodiment of the present invention. Thus, appearances of the phrase “in an example” in various places throughout this specification are not necessarily all referring to the same embodiment.
[0257] As used herein, a plurality of items, structural elements, compositional elements, and / or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on its presentation in a common group without indications to the contrary. In addition, various embodiments and examples of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
[0258] Although the foregoing has been described in some detail for purposes of clarity, it will be apparent that certain changes and modifications may be made without departing from the principles thereof. It should be noted that there are many alternative ways of implementing both the processes and apparatuses described herein. Accordingly, the present embodiments are to be considered illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. Those who have skill in the art will appreciate that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention.
Claims
CLAIMS1. A system for automated learning for a digital worker, comprising: one or more processors; and a non-transitory machine-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, via a network interface device, a video recording captured, executing on a client device, by a task recorder, wherein the video recording comprises screen capture data demonstrating steps for executing a specific task performed by a user; generating, by the one or more processors, a raw instructions file comprising recorded steps, user input events captured during the video recording, coordinate parameters specifying screen positions of user interactions, and timing information capturing intervals between consecutive actions; processing, by a machine learning module executing on the one or more processors, the video recording and the raw instructions file to generate a processed instructions file adapted to a context of the specific task, wherein processing comprises: analyzing individual frames of the video recording using computer vision techniques to detect and classify user interface elements; applying optical character recognition to extract text content from the detected user interface elements; generating bounding box coordinates defining rectangular regions encompassing each detected user interface element; determining click type parameters indicating types of mouse interactions performed at each interaction step; and correlating the detected user interface elements with the user input events based on temporal alignment between video frames and captured interaction timestamps; validating the processed instructions file by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file to verify accuracy of the detected user interface elements; transmitting, via the network interface device, the processed instructions file to a digital worker, wherein the digital worker is a software-based agent configured to executeautomated tasks by interacting with target applications according to the processed instructions file; and executing, by the digital worker, the specific task by reproducing the user interactions on the target applications using the bounding box coordinates and click type parameters from the processed instructions file.
2. The system of claim 1, wherein the user input events comprise mouse activity on a screen including cursor position changes and click events, voice commands captured via audio input, natural language instructions, screen state changes detected through frame comparison, and keyboard inputs including key press sequences, and wherein the user input events are captured concurrently with the video recording to enable temporal correlation between visual content and user interactions for generating the raw instructions file.
3. The system of claim 2, wherein the mouse activity comprises cursor position changes, click events including left clicks and right clicks with associated screen coordinates, and drag- and-drop operations with start and end position coordinates, and wherein the associated screen coordinates are used by the machine learning module to correlate mouse interactions with detected user interface elements during processing of the video recording.
4. The system of claim 1 , wherein the raw instructions file and the processed instructions file are JavaScript Object Notation (JSON) instructions files.
5. The system of claim 4, wherein the processed instructions file further comprises a frame number parameter identifying a video frame associated with each recorded action and time information specifying temporal relationships between consecutive actions.
6. The system of claim 1, wherein the machine learning module associates the text content extracted via optical character recognition with labels of detected user interface elements to enable element identification across different execution environments.
7. The system of claim 1, wherein the operations further comprise generating a structured hierarchical representation of user interface elements present on screens captured during the video recording as a tree structure, and wherein the structured hierarchical representation encapsulates information about each user interface element including a type classification of the user interface element, pixel coordinates and dimensions defining spatial arrangement of the user interface element, and a contextual relevance indicator of the user interface element within the specific task.
8. The system of claim 1, wherein the operations further comprise receiving, via a user interface of an organizer module, conditions from the user comprising a duration of the specific task, a number of repeating cycles, a scheduled start date and time, and output requirementsspecifying a format and storage location for task results, and executing the specific task by the digital worker according to the conditions at the scheduled start date and time.
9. The system of claim 1, wherein the processed instructions file is created using the video recording and the raw instructions file without requiring programming code from the user.
10. A method for automated learning for a digital worker, the method comprising the steps of: capturing, by a task recorder executing on a client device, a video recording of a user demonstrating steps involved in executing a specific task, wherein the task recorder monitors and records screen activity, mouse movements, and keyboard inputs; generating, by a processor, a raw instructions file comprising recorded steps, user input events captured during the recording, coordinate parameters identifying interaction locations, and timestamp data correlating each captured event with corresponding moments in the video recording; processing, by a machine learning module, the video recording and the raw instructions file to generate a processed instructions file adapted to a context of the specific task, wherein processing comprises: performing frame-by-frame analysis of the video recording to detect transitions between application states; applying object detection algorithms to identify boundaries and locations of user interface elements within each analyzed frame; classifying detected user interface elements according to element type including buttons, text fields, dropdown menus, and checkboxes; determining click type parameters indicating types of mouse interactions performed at each interaction step; and correlating the detected user interface elements with the user input events based on temporal alignment; validating, by the processor via a flow editor, the processed instructions file by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file, and displaying the video recording alongside corresponding processed instructions for user verification; assigning, by the processor via an organizer module, the specific task to a digital worker based on digital worker availability status and pre-trained capabilities;executing, by a player module, the specific task by directing the digital worker to perform interactions on target applications according to the processed instructions file and conditions provided by the user; and providing, by the digital worker, output results of the specific task in a format and storage location specified by user-defined parameters.
11. The method of claim 10, wherein the user input events comprise mouse activity on a screen including cursor trajectory paths and dwell time on specific screen locations, voice commands processed using speech recognition to convert spoken words into text-based instructions, natural language instructions provided through a text input interface, screen state changes detected through consecutive frame comparison, and keyboard inputs including modifier key states, and wherein the user input events are synchronized with corresponding video frames to enable the machine learning module to correlate each user interaction with a visual state of the screen at a time of interaction.
12. The method of claim 11, wherein the keyboard inputs comprise individual key presses with associated timing sequences, text entry sequences captured with character -level granularity, and keyboard shortcuts mapped to corresponding application functions, and wherein the keyboard inputs are correlated with detected text field elements identified through optical character recognition to generate text entry instructions in the processed instructions file.
13. The method of claim 10, wherein the raw instructions file and the processed instructions file are JavaScript Object Notation (JSON) instructions files, and wherein the processed instructions file comprises step-by-step instructional representations including frame number, bounding box coordinates, click type, and time information for each workflow action.
14. The method of claim 10, wherein the machine learning module employs optical character recognition to identify text within images captured during the video recording, extracts textual content from application windows and form field labels, and associates the identified text with detected user interface elements to enable cross-environment element identification.
15. The method of claim 10, wherein the conditions provided by the user comprise a duration of the specific task defining maximum execution time, a number of repeating cycles for recurring execution, output requirements specifying file format and destination path, and a scheduled start date and time for automated initiation.
16. The method of claim 15, wherein providing output results comprises delivering the output results in a format and location determined by a user-defined shadowing event capturedduring the video recording, and wherein the shadowing event specifies output delivery parameters including structured report format, database population fields, or visual format rendering in spreadsheet applications.
17. A non-transitory computer-readable medium storing program instructions that, when executed by one or more processors, cause the one or more processors to perform a method of automated learning for a digital worker, the method comprising: receiving, via a network interface, a video recording captured by a task recorder demonstrating steps for executing a specific task, wherein the video recording comprises screen capture frames at a frame rate sufficient to record application state transitions; generating a raw instructions file comprising recorded steps, user input events captured during the video recording with associated timestamps, and coordinate parameters enabling translation across different display configurations; processing the video recording and the raw instructions file using a machine learning module to generate a processed instructions file adapted to a context of the specific task, wherein processing comprises: extracting visual features from each frame including color patterns, edge boundaries, and text regions; detecting user interface elements using object detection algorithms and classifying elements by type; generating bounding box coordinates for each detected user interface element; determining click type parameters indicating types of mouse interactions performed at each interaction step; and correlating the detected user interface elements with the user input events based on temporal alignment between frame timestamps and interaction timestamps; validating the processed instructions file by programmatically comparing generated bounding box coordinates and click type parameters against corresponding user interaction coordinates and mouse event types captured in the raw instructions file; transmitting the validated processed instructions file to a digital worker via a network connection; and executing the specific task by the digital worker through reproduction of user interactions on target applications using the bounding box coordinates and click type parameters from the processed instructions file.
18. The non-transitory computer-readable medium of claim 17, wherein the user input events comprise mouse activity on a screen including cursor coordinates and click event types, voice commands converted to text via speech recognition processing, natural language instructions captured through text input, screen state changes identified through frame differencing, and keyboard inputs including key codes and modifier states, and wherein the user input events are temporally aligned with extracted visual features from corresponding video frames to enable correlation between user interactions and detected user interface elements during generation of the processed instructions file.
19. The non-transitory computer-readable medium of claim 18, wherein the method further comprises maintaining a timeline of activities captured during the video recording that tracks a temporal sequence of user actions and screen changes throughout the video recording, wherein the timeline associates timestamps with each activity to preserve timing relationships between workflow steps and enables the digital worker to execute actions with timing that matches an original task demonstration.
20. The non-transitory computer-readable medium of claim 17, wherein the method further comprises providing output results of the specific task by the digital worker in a format determined by a user-defined shadowing event captured during the video recording, wherein the digital worker populates designated database fields, generates structured reports, or renders output in spreadsheet format according to the shadowing event.