Processing methods, apparatuses, systems, and media

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By obtaining the complete DOM tree of the target webpage and combining the hierarchical relationship of the DOM tree with real-time status monitoring, the problem of positioning failure caused by dynamic structural changes in webpage interaction in RPA technology is solved, realizing the stability of the automated process and the ability to adapt to complex webpage scenarios.

CN122262425APending Publication Date: 2026-06-23CHINA MOBILE GROUP JIANGSU +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA MOBILE GROUP JIANGSU
Filing Date: 2026-03-04
Publication Date: 2026-06-23

Application Information

Patent Timeline

04 Mar 2026

Application

23 Jun 2026

Publication

CN122262425A

IPC: G06F16/957; G06F40/14

AI Tagging

Application Domain

Text processing Web data browsing optimisation

Technology Topics

Engineering Data mining

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineeringLight filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data setDescent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangementHeating and refrigeration combinationsHeat flowWorking fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories EngineeringSediment
Credit text analysis method, credit object auditing method and credit object auditing device
CN114386430AReduce labor costs Improve efficiency Finance Semantic analysisCredit cardEngineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing RPA technology suffers from positioning failures due to dynamic structural changes in web page interactions, affecting the stability of automated processes.

Method used

By obtaining the complete DOM tree of the target webpage and combining the hierarchical relationships of the DOM tree, the target element can be accurately located. The webpage state changes are monitored in real time, and an event-operation linkage mechanism is used to ensure that interactive operations are synchronized with the webpage state, adapting to dynamic changes in the webpage structure.

Benefits of technology

It effectively avoids the problem of positioning failure caused by dynamic changes in webpage structure, ensures the continuous execution of automated processes, and improves the stability of processes and the ability to adapt to complex webpage scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122262425A_ABST

Patent Text Reader

Abstract

Embodiments of the present application provide a processing method, device, system and medium, belonging to the field of information technology, which comprises: acquiring a complete DOM tree of a target webpage; acquiring a target element and an identifier of the target element according to the complete DOM tree; simulating a user interaction behavior, executing a corresponding operation event, and listening to a webpage state change in real time according to the target element and the identifier of the target element. By implementing this method, all element information of the webpage can be acquired through the complete DOM tree, the target element can be accurately positioned based on the hierarchical association of the DOM tree, and the interaction operation and the webpage state can be synchronized by combining real-time state listening, thereby effectively avoiding the positioning failure problem caused by the dynamic change of the webpage structure in the traditional RPA, and ensuring the coherent execution of the automatic process.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of information technology, specifically to a processing method, apparatus, system, and medium. Background Technology

[0002] With the development of internet technology, HyperText Markup Language (HTML) web pages have become the core carrier of business systems. Robotic Process Automation (RPA) technology is widely used in web page automation scenarios to replace manual repetitive tasks. However, existing RPA technologies suffer from positioning failures due to dynamic structural changes in web page interactions. This problem seriously affects the stability of automated processes and restricts the application of RPA technology in complex web page scenarios. Summary of the Invention

[0003] This application provides a processing method, apparatus, system, and medium to solve the positioning failure problem caused by dynamic structural changes in web page interaction in traditional RPA.

[0004] Firstly, a processing method is provided, including:

[0005] Obtain the complete Document Object Model (DOM) tree of the target webpage;

[0006] Based on the complete DOM tree, obtain the target element and its identifier;

[0007] Based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time.

[0008] Optionally, obtaining the complete DOM tree of the target webpage includes:

[0009] Send a page loading command to the target webpage to drive the target webpage to complete page rendering, and obtain the complete DOM tree after rendering;

[0010] The complete DOM tree contains the tag information, attribute parameters, and hierarchical relationships of all elements on the target webpage, and / or the complete DOM tree supports parsing inline frames, shadow DOM, and / or dynamically generated element content.

[0011] Optionally, based on the complete DOM tree, obtain the target element and its identifier, including:

[0012] Parse the element location expression according to the preset task requirements, filter and match target elements in the complete DOM tree, and obtain the identifier of the target element;

[0013] Among them, the preset task requirements are used to represent the automated operation goals pre-configured by the user.

[0014] Optionally, based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time, including:

[0015] Based on the target element and its identifier, simulate user interaction behavior and execute corresponding operation events;

[0016] Real-time monitoring of webpage status changes is achieved through an event-action linkage mechanism.

[0017] Optionally, the method further includes:

[0018] If the target element is detected to be invalid or the webpage structure has changed during the process of obtaining the target element and / or simulating user interaction behavior, the complete DOM tree is re-parsed and / or the element positioning expression is adjusted.

[0019] Optionally, the method further includes:

[0020] After simulating user interaction, an execution log is generated, which records the operation path, execution details and / or exception handling of each step.

[0021] Secondly, a processing apparatus is provided, comprising:

[0022] The RPA controller is used to retrieve the complete DOM tree of the target webpage;

[0023] The dynamic positioning module is used to obtain the target element and its identifier based on the complete DOM tree.

[0024] The operation simulation module is used to simulate user interaction behavior based on the target element and its identifier, execute corresponding operation events, and monitor changes in the webpage state in real time.

[0025] Thirdly, a robotic process automation system is provided, including the processing device as described in the second aspect.

[0026] Fourthly, a readable storage medium is provided, on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method described in the first aspect.

[0027] Fifthly, a computer program product is provided, characterized in that it includes computer instructions, which, when executed by a processor, implement the steps of the method described in the first aspect.

[0028] In this embodiment, all element information of a webpage can be obtained through a complete DOM tree. The target element can be accurately located based on the hierarchical association of the DOM tree. Real-time status monitoring ensures that interactive operations are synchronized with the webpage status, effectively avoiding the location failure problem caused by dynamic changes in the webpage structure in traditional RPA, and ensuring the continuous execution of automated processes. Attached Figure Description

[0029] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:

[0030] Figure 1 This is a flowchart of a processing method provided by an embodiment of this application;

[0031] Figure 2 This is a flowchart of another processing method provided by an embodiment of this application;

[0032] Figure 3 This is the structure of a processing apparatus provided in an embodiment of this application;

[0033] Figure 4 This is a schematic diagram of a robotic process automation system provided in an embodiment of this application;

[0034] Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0035] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0036] The term "comprising," and any variations thereof, used in the specification and claims of this application, is intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those explicitly listed, but may include other steps or units not explicitly listed or inherent to such process, method, product, or apparatus. Furthermore, the use of "and / or" in the specification and claims indicates at least one of the connected objects, such as A and / or B, indicating the inclusion of A alone, B alone, or both A and B.

[0037] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0038] Current RPA technology for automating HyperText Markup Language (HTML) web page operations mainly relies on a variety of related technologies. While these technologies can meet basic automation needs, they all have significant limitations when adapting to modern web page scenarios, as follows:

[0039] 1) Coordinate mapping method simulates user operation by fixing screen coordinates. Typical tools such as PyAutoGUI have the core limitation that they cannot cope with web page scaling, resolution changes or responsive layout adjustments. In addition, element coordinates are completely decoupled from business semantics, resulting in extremely poor readability of automated scripts and high subsequent maintenance costs.

[0040] 2) Image template matching relies on open-source computer vision libraries (OpenCV) to locate elements through pixel-level comparison. Although this method can handle unstructured pages, it is significantly affected by fonts, color schemes, and dynamic content (such as real-time data updates). In addition, it requires the pre-collection of a large number of template images, resulting in high deployment costs. By combining large-scale model techniques, this method can be specifically optimized by introducing relevant algorithms and small models to improve its performance. For handling font and color interference, an image feature extraction algorithm based on Convolutional Neural Networks (CNNs) can be introduced. CNNs can automatically learn deep image features, such as edges, textures, and shapes. These features are more robust than pixel-level information and are less affected by font and color changes. By training a small CNN model specifically for webpage element image feature extraction, and inputting the template image and the image to be matched into this model, high-level features are extracted and then compared, effectively improving matching accuracy. For example, a lightweight MobileNet model can be used as the basic architecture, fine-tuned to adapt it to the webpage element feature extraction task. For dynamic content interference, temporal image analysis algorithms combined with Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks can be used. Memory (LSTM), RNN and LSTM can process time-series data. By analyzing the image sequences of web page elements at different time points, we can capture the changing patterns of dynamic content, distinguish stable element features from dynamically changing content, build a small LSTM model, learn the image sequences of dynamic elements, predict the features of elements in a stable state, and then match them with the features of the template image, thereby reducing the impact of dynamic content on the matching results.

[0041] 3) Static attribute retrieval method uses fixed attributes such as the identifier (id) and category (class) of HTML elements for location. A typical application is the find_element_by_id interface of the web automation testing framework (Selenium). The core drawback of this method is that it is difficult to handle dynamically generated attribute values (such as class names with timestamps), nested inline frame (iframe) structures, or asynchronous JavaScript and XML (AJAX) asynchronous loading of content. It has poor compatibility with modern web technologies and cannot adapt to complex web development scenarios.

[0042] The aforementioned technologies suffer from three major flaws: First, a lack of semantic relevance. Traditional positioning methods rely solely on visual features or static tags, failing to fully utilize the hierarchical semantic relationships of the DOM tree (such as parent-child nodes and ancestor paths). This results in fragile element positioning logic in complex pages, making positioning failures common. Second, insufficient adaptation to dynamic scenarios. For SPA applications with dynamically modified DOM structures via JavaScript (such as virtual DOM updates in Vue / React), dynamically loaded list items (such as infinitely scrolling pages), or elements nested in the shadow DOM, there is a lack of effective parsing methods, making stable positioning difficult. Third, uncontrolled operation timing. When asynchronous requests (such as AJAX data loading) are out of sync with page rendering, traditional RPA cannot detect the operation completion status, easily triggering "element not ready" exceptions and causing automated processes to be interrupted.

[0043] See Figure 1 The embodiments of this application provide a processing method, the specific steps of which include:

[0044] Step 11: Obtain the complete DOM tree of the target webpage;

[0045] The complete DOM tree refers to the document object model tree that contains the tag information, attribute parameters, and hierarchical relationships of all elements on the target webpage after the page rendering is completed. It also supports parsing inline frames (iframes), shadow DOM, and dynamically generated element content.

[0046] Step 12: Based on the complete DOM tree, obtain the target element and its identifier;

[0047] In this context, the target element refers to the web page element that the RPA system selects and matches from the complete DOM tree of the target web page during the automated interaction process, based on preset task requirements (such as form filling, button clicking, data extraction, etc.), and that needs to perform simulated user interaction operations (clicking, form filling, data submission, etc.).

[0048] Step 13: Based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time;

[0049] By implementing this approach, all element information of a webpage can be obtained through the complete DOM tree. The target element can be accurately located based on the hierarchical relationship of the DOM tree. Combined with real-time status monitoring, the interactive operation is synchronized with the webpage status, effectively avoiding the location failure problem caused by the dynamic changes in the webpage structure in traditional RPA, and ensuring the continuous execution of the automated process.

[0050] In some possible implementations, obtaining the complete DOM tree of the target webpage includes:

[0051] Send a page loading command to the target webpage to drive the target webpage to complete page rendering, and obtain the complete DOM tree after rendering;

[0052] The complete DOM tree contains the tag information, attribute parameters, and hierarchical relationships of all elements on the target webpage, and / or the complete DOM tree supports parsing inline frames, shadow DOM, and / or dynamically generated element content.

[0053] This approach ensures the completeness and comprehensiveness of the obtained DOM tree, covering all elements and structures of the webpage. Among them, an inline frame (iframe) is an independent webpage container embedded in the current webpage, and the shadow DOM is a technology that encapsulates webpage elements to achieve element isolation. Dynamically generated element content refers to webpage elements generated in real time during the page rendering process using scripts such as JavaScript. By parsing the above content, comprehensive data support is provided for subsequent element positioning.

[0054] In some possible implementations, the target element and its identifier are obtained from the complete DOM tree, including:

[0055] The element location expression is parsed according to the preset task requirements. The element location expression includes XML Path Language (XPath) path expression and Cascading Style Sheets (CSS) selector. The target element is filtered and matched in the complete DOM tree using the cascading matching rule of the two combined, and the identifier of the target element is obtained.

[0056] Among them, the preset task requirements are used to represent the automated operation goals pre-configured by the user, which are the web page interaction tasks that the RPA system needs to complete, including but not limited to web page data extraction, form filling, button clicking, page navigation, etc. The element location expression is a syntax rule used to find the target element in the DOM tree, which is used to specify the search path of the target element in the DOM tree.

[0057] Implementing this method allows for precise location of target elements based on actual user needs. By parsing the location expression, it enables rapid filtering and matching of target elements in the DOM tree, while simultaneously obtaining the unique identifier of the target element. This provides accurate element pointing for subsequent interactive operations. The identifier of the target element is a unique combination of attributes of the DOM node, which can uniquely identify the target element and avoid element confusion.

[0058] In some possible implementations, based on the target element and its identifier, user interaction behavior is simulated, corresponding operation events are executed, and changes in the webpage state are monitored in real time, including:

[0059] Based on the target element and its identifier, simulate user interaction behavior and execute corresponding operation events;

[0060] The event-operation linkage mechanism monitors webpage state changes in real time. This mechanism includes listening to DOM change events through a MutationObserver. The webpage state changes include page navigation, pop-up windows, and / or asynchronous loading completion.

[0061] Implementing this method can accurately simulate the webpage interaction behavior of human users, ensuring the accuracy and rationality of the operation. Among them, the event-operation linkage mechanism refers to the mechanism that associates user interaction operations with changes in the webpage state. Through this mechanism, changes in the webpage state can be captured in real time, ensuring that the interaction operation remains synchronized with the webpage state and avoiding operation failure due to changes in the webpage state. The interaction behavior includes common webpage interaction actions such as click operations, form filling operations, and data submission operations.

[0062] In some possible implementations, the method further includes:

[0063] During the process of acquiring target elements and / or simulating user interaction behavior, if the target element is detected to be invalid or the webpage structure has changed, the complete DOM tree is re-parsed and / or the element positioning expression is adjusted. The re-parse of the complete DOM tree includes hierarchical traversal of the main document, inline frames (iframes), and shadow DOM. The adjustment of the element positioning expression includes optimizing the positioning expression using the Levenshtein distance fuzzy matching algorithm.

[0064] Implementing this method can address anomalies caused by dynamic changes in webpage structure. Among these, target element failure refers to situations where the target element's DOM node is removed or the element's attributes are changed, resulting in the inability to locate it properly. Changes in webpage structure include adjustments to the DOM node hierarchy, dynamic element loading or deletion, etc. By re-parsed the DOM tree or adjusting the positioning expression, the target element can be repositioned, avoiding interruptions to the automated process.

[0065] In some possible implementations, the method further includes:

[0066] After simulating user interaction, an execution log is generated, which records the operation path, execution details and / or exception handling of each step.

[0067] Implementing this method enables the traceability of automated processes. The execution log is a textual record of the complete execution process of the automated process. The operation path refers to the execution order and operation objects of each step. The execution details refer to the specific operation content and execution parameters of each step. The exception handling situation refers to the exception type, exception handling method and handling result that occurred during the process execution, providing reliable data support for subsequent process optimization and problem investigation.

[0068] See Figure 2 The embodiments of this application provide a processing method, the specific steps of which include:

[0069] Step 21: Obtain the complete DOM tree of the target webpage;

[0070] Specifically, the RPA controller sends a page loading command to the target webpage, driving the webpage to complete page rendering, and controls the DOM parsing engine to obtain the complete DOM tree after rendering. This complete DOM tree is the core data foundation for the RPA system to achieve webpage element recognition and interactive operations. This step uses the RPA controller to call the RPA system's built-in DOM parsing engine to obtain the complete DOM tree. This DOM parsing engine uses a headless browser kernel (such as Puppeteer / Playwright) to obtain the complete DOM tree in real time, supporting the parsing of Shadow DOM, iframe nesting structures, and dynamically generated script tag content, ensuring the completeness and comprehensiveness of the DOM tree acquisition. Simultaneously, the DOM parsing engine expands the DOM metadata storage, attaching rendering attributes (such as visibility and coordinate boundaries) and business tags to each DOM node. Business tags inject business semantics through custom data-* attributes, providing richer reference data for subsequent element positioning and operation simulation. The DOM parsing engine feeds back the obtained complete DOM tree and metadata to the RPA controller, which then passes it to subsequent modules. The complete DOM tree is a document object model tree that organizes all elements of a webpage, including but not limited to text, images, buttons, and / or input boxes, in a tree-like hierarchical structure. It includes the tag information, attribute parameters, and hierarchical relationships of all elements. The RPA controller controls the DOM parsing engine and works with the headless browser kernel through the browser developer tools interface to further ensure the completeness and accuracy of the DOM tree acquisition.

[0071] The complete DOM tree serves as the core data support for all subsequent execution steps, constituting the preparatory stage of the entire automation process. All subsequent execution steps, such as element location and operation simulation, are scheduled by the RPA controller within the RPA system framework and depend on the DOM tree data acquired in this step. Through the RPA controller's precise scheduling of the DOM parsing engine and the aforementioned design of the DOM parsing engine, the accuracy of element location in subsequent steps can be ensured, avoiding location deviations caused by missing or incomplete DOM tree data. Simultaneously, it provides metadata support for subsequent dynamic location and operation simulation.

[0072] By implementing this approach, under the RPA system framework and coordinated by the RPA controller, the RPA system can be ensured to fully acquire all element information of the webpage. This effectively avoids problems such as failure to locate subsequent elements and abnormal operation execution caused by incomplete DOM trees or incomplete parsing, laying a solid data foundation for the stable advancement of the entire automation process and significantly improving the reliability and stability of the process initialization phase.

[0073] Step 22: Based on the complete DOM tree, obtain the target element and its identifier;

[0074] Specifically, the RPA controller calls the dynamic positioning module. Based on the complete DOM tree obtained in step 1, it parses the element positioning expression according to the preset task requirements, filters and matches target elements in the DOM tree, and returns the identifier of the target element, providing a positioning basis for subsequent operation simulation. The preset task requirements are the automated execution goals pre-configured by the user and passed to the RPA controller. These may include, but are not limited to, specific operation instructions such as filling and submitting login forms, extracting data from web page tables, and / or clicking specific buttons to navigate. The RPA controller controls the dynamic positioning module to adopt a composite positioning strategy, prioritizing the use of XPath expressions with context constraints (such as / / div[@class='container'] / descendant::button[contains(@class, 'submit')]), combined with CSS selector pseudo-classes (:nth-child, :last-of-type) to achieve precise matching of target elements and improve positioning accuracy.

[0075] In this step, the RPA controller schedules the dynamic positioning module using a semantic-driven dynamic positioning model. This model, combined with the business tags (custom data-* attributes) injected by the DOM parsing engine in step 1, deeply integrates the hierarchical relationship of the DOM tree, element attribute features, and business semantics to construct a context-aware positioning expression generation algorithm. This effectively overcomes the limitations of traditional static attribute matching methods, improving the flexibility and scenario adaptability of element positioning. Simultaneously, the RPA controller controls the dynamic positioning module to support dynamic attribute parsing. For attribute values containing variables (such as class="button-{{timestamp}}"), it extracts pattern features (class=~'button-') through regular expressions for fuzzy matching, adapting to dynamically changing element attributes and further improving positioning adaptability. Based on the aforementioned semantic-driven dynamic positioning model and composite positioning strategy, the RPA controller controls the dynamic positioning module to read and parse preset element positioning expressions. These element positioning expressions include, but are not limited to, XPath positioning expressions and CSS selectors, both commonly used syntaxes for webpage element positioning. They are used to achieve accurate positioning of single or multiple target elements on a webpage, providing the RPA system with precise search paths for target elements.

[0076] After the location expression is parsed, the RPA controller passes the parsed location expression to the dynamic location module. This dynamic location module is the core functional module of the RPA system responsible for parsing location expressions and searching DOM tree elements, and can adapt to scenarios with dynamic changes in web page elements. At the same time, the RPA controller controls the dynamic location module to incorporate cross-level structure traversal technology. Combined with the DOM parsing engine's ability to parse Shadow DOM and iframe nested structures in step 21, its basic implementation logic provides technical support for element location in subsequent complex nested scenarios. The cross-level structure traversal technology is designed with a recursive parsing algorithm of iframe-Shadow DOM-main document, which can achieve blind spot location of web page elements in deeply nested scenarios, effectively filling the technical gap of existing RPA technology in complex web page structure location.

[0077] Subsequently, the RPA controller controls the dynamic positioning module to perform a query operation based on the complete DOM tree obtained in step 1 to filter target elements, accurately match the target elements and return the identifier of the target element, which is then fed back to the RPA controller. The RPA controller then passes the information to the operation simulation module in step 23, providing accurate element positioning basis for the operation simulation stage. Figure 2 The optional sub-step indicated by the dashed arrow is that the RPA controller controls the dynamic positioning module, which feeds back the matched node list to itself for positioning result verification or reuse in subsequent operations, further improving the reliability of element positioning and ensuring the accuracy of subsequent operation simulation.

[0078] By implementing this approach, under the RPA system framework and scheduled by the RPA controller, the target element can be accurately and flexibly located, effectively breaking through the application limitations of traditional static positioning methods. At the same time, it provides reliable technical support for element positioning in complex nested pages, significantly reduces the element positioning failure rate, provides stable and accurate positioning basis for subsequent operation simulation, and improves the scenario adaptability and execution reliability of the entire automation process.

[0079] Step 23: Based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time;

[0080] Specifically, the RPA controller calls the operation simulation module, which, based on the target element and its identifier obtained in step 22, simulates user interaction behavior, executes corresponding operation events, and monitors webpage state changes in real time to ensure that operation execution is synchronized with the webpage state. These operation events include, but are not limited to, click operations, form filling operations, and / or data submission operations. The RPA controller controls the operation simulation module to simulate user behavior based on the WebDriver protocol, supports custom event sequences (such as key combinations and drag-and-drop operations), and uses requestAnimationFrame to control the operation frame rate, avoiding triggering webpage anti-scraping mechanisms and improving the security and compatibility of the operation. The operation simulation module is the core functional module in the RPA system responsible for simulating human user webpage interaction behavior, enabling the simulation of various webpage interaction actions such as clicking, inputting, dragging, and submitting, with the operation process being fed back to the RPA controller in real time.

[0081] To address the technical challenge of uncontrolled automated processes in asynchronous scenarios, the RPA controller's operation simulation module employs an event-operation linkage mechanism. Through the MutationObserver interface and AJAX request interception technologies such as XMLHttpRequest monitoring, it dynamically synchronizes the operation execution sequence with the webpage state. Upon receiving operation event trigger commands from the RPA controller, the operation simulation module simulates user behavior to execute corresponding interactive operations on the target webpage. Simultaneously, relying on the aforementioned event-operation linkage mechanism, it monitors webpage state changes in real time. These changes include, but are not limited to, page redirects, pop-up windows, and / or AJAX request completions. The monitoring results are fed back to the RPA controller in real time. Furthermore, the RPA controller's operation simulation module integrates a visual verification plugin. Through screenshot comparison (such as using a pixel matching library), it verifies the visual correctness of the operation results, ensuring the accuracy of operation execution, promptly identifying operation deviations, and feeding them back to the RPA controller for subsequent adjustments.

[0082] Among them, AJAX requests are asynchronous JavaScript and XMLHttpRequest requests, which are the core technology for loading data without a full page refresh. They are widely used in scenarios such as dynamic tables and pop-up loading. By monitoring the changes in the page state in real time, it can be ensured that the operation execution is synchronized with the page state. After the operation simulation module completes the preset operation, it generates and returns the event handle corresponding to the operation event and feeds it back to the RPA controller. The event handle is a unique identifier for the operation event and can be used by the RPA controller to track the execution status of the operation event, listen to the subsequent asynchronous events triggered by the operation event, and provide the execution basis for the process closing stage.

[0083] Figure 2 The optional sub-step indicated by the dashed arrow is that the RPA controller registers the aforementioned event handles to the event listening queue. The event listening queue is a functional queue used to monitor the execution status of various operation events. It can detect the completion status of asynchronous events in real time and feed back the detection results to the RPA controller to ensure the continuous execution of the automated process.

[0084] By implementing this approach, within the RPA system framework, the RPA controller coordinates and schedules operations, enabling dynamic synchronization between user actions and webpage states. This effectively solves the technical problem of uncontrolled automated processes in asynchronous scenarios, avoids operation failures caused by asynchronous page loading or state changes, circumvents webpage anti-scraping mechanisms, verifies the accuracy of operations, ensures the continuity, accuracy, and security of automated operations, and enhances the anti-interference capability of automated processes.

[0085] Step 24: Fault tolerance (anomaly adaptation);

[0086] Specifically, during the execution of step 22 (element location) and step 23 (operation simulation), the RPA controller receives real-time feedback on the execution status from each module. If a target element failure or a change in the webpage structure is detected, the fault tolerance and adaptive module is immediately invoked to trigger an adaptive fault tolerance strategy, preventing process interruption and ensuring the continuous progress of the automated task. Target element failure includes situations such as DOM node removal or element attribute changes. The fault tolerance and adaptive module is the core functional module in the RPA system responsible for handling operational anomalies and ensuring stable process execution. It can automatically respond to various abnormal scenarios such as webpage element changes, operation failures, and network latency. Under the control of the RPA controller, it employs a three-level fault tolerance strategy to achieve comprehensive anomaly adaptation, and the fault tolerance process is fed back to the RPA controller in real time.

[0087] The adaptive fault-tolerance strategy employs a dynamic relocation strategy with adaptive fault tolerance. Under the scheduling of the RPA controller, it combines a three-level fault-tolerance strategy with the fault-tolerance and adaptive modules. Specifically, it includes: Level 1: DOM tree backtracking. When element location fails, the fault-tolerance and adaptive modules search for similar nodes layer by layer upwards along the parent node, determining the matching node based on the tag name and attribute similarity score, and feeding the result back to the RPA controller; Level 2: fuzzy matching. It uses Levenshtein distance to calculate the similarity of node text or attribute values, sets a threshold (e.g., 80%) to match similar elements, adapts to subtle changes in element attributes, and feeds the result back to the RPA controller; Level 3: retry mechanism. It combines an exponential backoff algorithm (retry-after) to handle element loading delays caused by network latency, avoiding operation failures due to network fluctuations. The retry process is controlled by the RPA controller. This dynamic relocation strategy is based on a dual implementation mechanism of fuzzy matching and DOM tree backtracking, which can automatically repair the target element without human intervention after it fails, effectively improving the anti-interference capability of the automated process. At the same time, under the scheduling of the RPA controller, combined with the cross-level structure traversal technology mentioned in step 22 and the DOM parsing engine's ability to parse nested structures in step 21, when the target element in nested pages such as iframe and Shadow DOM fails, the fault tolerance and adaptive module can realize the relocation of the target element through a recursive parsing algorithm, further ensuring the integrity and reliability of element positioning. The relocation result is fed back to the RPA controller for subsequent operation.

[0088] The adaptive fault-tolerance strategy specifically includes the RPA controller controlling the fault-tolerance and adaptive modules, re-parses the complete DOM tree obtained in step 21, adjusts the positioning expression in step 22, or switches the positioning strategy to ensure the smooth progress of element positioning in step 22 and operation simulation in step 23, until the preset operation is completed. The fault-tolerance strategy can be dynamically adjusted according to the type of exception. Under the overall coordination of the RPA controller, it adapts to different scenarios such as webpage structure changes, element failures, and network latency under different levels of the three-level fault-tolerance strategy, reducing the failure rate of automated operations and ensuring the continuity and stability of automated tasks.

[0089] By implementing this approach, under the RPA system framework and scheduled by the RPA controller, automatic repair of target element failures can be achieved without manual intervention. This effectively addresses various anomalies such as network latency and element changes, enhances the anti-interference capability and operational stability of automated processes, significantly reduces the probability of process interruptions caused by changes in webpage structure, target element failures, and network fluctuations, ensures the smooth completion of automated tasks, and improves the practicality of automated processes.

[0090] Step 25: Process Confirmation and Log Retention;

[0091] Specifically, the RPA controller summarizes the execution status of step 23 (operation simulation) and step 24 (if required) (fault tolerance processing). After confirming the completion of the automated interactive process, it controls the RPA system to generate detailed execution logs for subsequent process optimization and troubleshooting. Specifically, if an optional asynchronous event listening sub-step is executed in step 23, the event listening queue detects the completion of the asynchronous event and sends an event completion notification to the RPA controller. The RPA controller confirms the process completion based on this notification. If the optional sub-step is not executed, after the operation simulation in step 23 is completed, the RPA controller, combined with the visual verification results from the operation simulation module, confirms the operation is correct and directly confirms the process completion. If step 24 triggers a three-level fault tolerance strategy, the RPA controller controls the RPA system to synchronously record the fault tolerance processing process, the fault tolerance level used, and the processing results in the logs, providing a basis for subsequent fault tolerance strategy optimization.

[0092] After the process execution is confirmed to be complete, the RPA controller controls the RPA system to automatically generate detailed execution logs. These execution logs are textual information recording the complete execution process of the automated process and serve as important data for subsequent process troubleshooting and strategy optimization. Specifically, the execution logs record the scheduling process of the RPA controller, the complete operation path from steps 21 to 24, the parsing results of the DOM parsing engine, the positioning process of the dynamic positioning module, the execution details and visual verification results of the operation simulation module, the fault tolerance and adaptive module's fault tolerance handling records, the execution time of each stage, and various abnormal events, including but not limited to positioning failure, operation timeout, and network latency. This provides reliable data support for subsequent automation task optimization, adjustment of the positioning strategy in step 2, and improvement of the fault tolerance strategy in step 24, constituting the final stage of the entire automated process.

[0093] By implementing this approach, the execution results of the RPA system's automated processes can be clearly defined through the overall confirmation and scheduling of the RPA controller. At the same time, the process execution logs are fully retained, providing reliable data support for subsequent process optimization and troubleshooting. This facilitates the continuous iteration and optimization of automation strategies, further improving the operational stability and execution efficiency of automated processes.

[0094] For scenarios involving dynamically loaded table data, where the entire table content is not displayed during the initial page load, asynchronous loading of new data requires actions such as scrolling the page and clicking the load button. During element location in step 22, relying on the complete DOM tree obtained in step 1, the MutationObserver interface in the event-operation linkage mechanism used in step 23 is employed to achieve precise location of the target element. The MutationObserver interface is a browser-provided DOM tree change monitoring API that can monitor changes such as node additions, deletions, and attribute modifications in the DOM tree in real time. It is specifically designed to capture dynamically loaded web page elements. This method allows for real-time capture of newly added table elements and execution of corresponding operations without waiting for the entire web page to load, effectively improving the efficiency of automated operations.

[0095] For iframe-nested page scenarios, which involve embedding another independent webpage within a webpage, this approach is widely used in multi-system integration, pop-up embedded pages, and other similar scenarios. After obtaining the complete DOM tree in step 21, the DOM tree is recursively parsed layer by layer using the cross-level structure traversal technique in step 22. First, the parent iframe element is located and the context of the parent iframe is switched. Then, the element positioning operation in step 22 is executed, thereby achieving precise positioning and manipulation of the target element within the nested page.

[0096] For micro-frontend architecture scenarios, which consist of multiple independently developed and deployed sub-applications, each with its own independent DOM structure, the element location process in step 22 is performed. Based on the complete DOM tree obtained in step 21, cross-level traversal techniques are used to traverse the webcomponents tags in the webpage and parse the Shadow DOM structure of each sub-application. Webcomponents is a core technology for building reusable web components and is commonly used in micro-frontend architectures for encapsulating sub-applications. The Shadow DOM, or shadow DOM, is an independent DOM structure encapsulated within web components that does not conflict with the external DOM structure. It requires a specialized parsing method to locate internal elements. This approach overcomes the limitations of traditional DOM operations, enabling element location and interaction across sub-applications.

[0097] For scenarios involving dynamically generated forms using JSON Schema, such as Ant Design Forms, the JSON Schema is a specification for defining data structures. Dynamic forms are automatically generated based on this specification, and the number and types of form elements can dynamically change according to actual needs. During element location in step 22, a semantically driven dynamic location model is used, leveraging the DOM parsing engine to extract the `data-schema` attribute of the form elements. The `data-schema` attribute is the core attribute that marks the data structure of dynamic form elements and can be used to identify the types and rules of form fields. By extracting this attribute, the corresponding element location expression is automatically generated, enabling automated filling and submission of the dynamic form.

[0098] Meanwhile, in any of the above application scenarios, if the target element is detected to be invalid, such as a dynamic form field refresh or a nested page element change, the fault tolerance processing step 24 is triggered. Through an adaptive fault tolerance dynamic relocation strategy, the complete DOM tree obtained in step 21 is re-parsed or the positioning expression in step 22 is adjusted to ensure the smooth execution of the automated task. Finally, through the log retention step in step 25, the entire automated operation process is recorded to provide a reference for subsequent technical optimization.

[0099] See Figure 3 Embodiments of this application provide a processing apparatus, the apparatus 300 comprising:

[0100] RPA controller 301 is used to obtain the complete DOM tree of the target webpage;

[0101] The dynamic positioning module 302 is used to obtain the target element and its identifier based on the complete DOM tree.

[0102] The operation simulation module 303 is used to simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time based on the target element and its identifier.

[0103] In one possible implementation, obtaining the complete DOM tree of the target webpage includes:

[0104] Send a page loading command to the target webpage to drive the target webpage to complete page rendering, and obtain the complete DOM tree after rendering;

[0105] The complete DOM tree contains the tag information, attribute parameters, and hierarchical relationships of all elements on the target webpage, and / or the complete DOM tree supports parsing inline frames, shadow DOM, and / or dynamically generated element content.

[0106] In one possible implementation, the target element and its identifier are obtained from the complete DOM tree, including:

[0107] Parse the element location expression according to the preset task requirements, filter and match target elements in the complete DOM tree, and obtain the identifier of the target element;

[0108] Among them, the preset task requirements are used to represent the automated operation goals pre-configured by the user.

[0109] In one possible implementation, based on the target element and its identifier, user interaction behavior is simulated, corresponding operation events are executed, and changes in the webpage state are monitored in real time, including:

[0110] Based on the target element and its identifier, simulate user interaction behavior and execute corresponding operation events;

[0111] Real-time monitoring of webpage status changes is achieved through an event-action linkage mechanism.

[0112] In one possible implementation, the device 300 further includes:

[0113] The fault tolerance and adaptive module is used to re-parse the complete DOM tree and / or adjust the element positioning expression if the target element is detected to be invalid or the web page structure has changed during the process of obtaining the target element and / or simulating user interaction behavior.

[0114] In one possible implementation, the device 300 further includes:

[0115] The log generation module is used to generate an execution log after simulating user interaction operations. The execution log records the operation path, execution details and / or exception handling of each step.

[0116] The apparatus provided in this application embodiment can achieve... Figure 1 The various processes implemented in the method embodiments shown achieve the same technical effects, and will not be described again here to avoid repetition.

[0117] See Figure 4 The RPA system comprises an RPA controller and four collaborative functional modules: a DOM parsing engine, a dynamic positioning module, an operation simulation module, and a fault-tolerant and adaptive module. The RPA controller, as the core scheduling unit, coordinates the startup, operation, and interaction of these four modules. The specific composition and functions of each module are as follows. Under the scheduling of the RPA controller, each module works collaboratively to support… Figure 2 Complete execution of steps 21-25:

[0118] (1) RPA controller

[0119] The RPA controller is the core scheduling and control module of the RPA system, serving as the central hub of the entire RPA web page automation interaction system. It is responsible for coordinating the collaborative operation of the DOM parsing engine, dynamic positioning module, operation simulation module, and fault-tolerant and adaptive module. Figure 2 The entire process scheduling operation in steps 21-25 is as follows. Its core functions include: receiving user-preset automation task requirements and issuing operation instructions to each functional module; receiving real-time feedback on execution status, data, and exception information from each module and dynamically adjusting the execution sequence of each module; controlling the pace of process advancement and realizing data transfer and instruction interaction between modules in each stage of steps 21-25; triggering the exception handling mechanism of the fault-tolerant and adaptive modules, confirming the process execution result, and controlling the RPA system to generate retention logs to ensure the continuous and stable execution of the entire automation process, which is completely consistent with the scheduling function of the RPA controller in Example 1.

[0120] (2) DOM parsing engine

[0121] The DOM parsing engine is the basic data acquisition module of the RPA system. Its core function is to achieve the complete acquisition and parsing of the webpage DOM tree under the scheduling of the RPA controller. Figure 2 The core operation in step 21 provides core data support for the entire system. This DOM parsing engine uses a headless browser kernel (such as Puppeteer / Playwright) to obtain the complete DOM tree of the target webpage in real time after rendering. It supports parsing Shadow DOM, iframe nesting structure, and dynamically generated Script tag content, ensuring the completeness and comprehensiveness of the DOM tree acquisition and avoiding subsequent abnormal positioning and operation due to missing DOM trees or incomplete parsing.

[0122] Meanwhile, the DOM parsing engine extends the DOM metadata storage function, attaching rendering attributes (such as visibility and coordinate boundaries) and business tags to each DOM node. These business tags inject business semantics through custom `data-*` attributes, providing richer reference data for element positioning in the subsequent dynamic positioning module and operation execution in the operation simulation module. Furthermore, the DOM parsing engine, in conjunction with the browser developer tools interface, further improves the completeness and accuracy of the DOM tree acquisition, feeding back the acquired DOM tree and metadata to the RPA controller. This ensures that subsequent modules can operate based on complete and accurate DOM data, achieving a perfect functional match. Figure 2 Step 21 is the core requirement for obtaining the DOM tree.

[0123] (3) Dynamic positioning module

[0124] The dynamic positioning module is the core module for element positioning in the RPA system. Its core function is to accurately locate target elements based on the complete DOM tree obtained by the DOM parsing engine, under the scheduling of the RPA controller. Figure 2 The core operation in step 22 provides precise element location data for the operation simulation module. This dynamic positioning module employs a composite positioning strategy, prioritizing the use of XPath expressions with context constraints (such as / / div[@class='container'] / descendant::button[contains(@class, 'submit')]), combined with CSS selector pseudo-classes (:nth-child, :last-of-type) to achieve precise matching of target elements and improve positioning accuracy.

[0125] This module integrates a semantically driven dynamic positioning model, combining business tags (custom data-* attributes) injected by the DOM parsing engine. It deeply integrates the hierarchical relationship of the DOM tree, element attribute features, and business semantics to build a context-aware positioning expression generation algorithm. This effectively breaks through the limitations of traditional static attribute matching methods and improves the flexibility and scenario adaptability of element positioning. At the same time, it supports dynamic attribute parsing. For attribute values containing variables (such as class="button-{{timestamp}}"), it extracts pattern features (class=~'button-') through regular expressions for fuzzy matching to adapt to dynamically changing element attributes.

[0126] Furthermore, this module incorporates cross-level structure traversal technology, combined with the DOM parsing engine's ability to parse Shadow DOM and iframe nested structures. It features a recursive parsing algorithm from iframe-Shadow DOM-main document, enabling seamless positioning of web page elements in deeply nested scenarios. This effectively fills the technological gap in existing RPA technologies for locating elements within complex web page structures, and its functionality is fully compatible. Figure 2 The positioning requirement in step 22 can be achieved under the scheduling of the RPA controller by performing operations such as parsing the positioning expression, filtering and matching target elements, and providing feedback on the positioning results.

[0127] (4) Operation simulation module

[0128] The operation simulation module is the core interactive execution module of the RPA system. Its core function is, under the scheduling of the RPA controller, to simulate human user interaction behavior and verify the operation results based on the unique identifier of the target element returned by the dynamic positioning module. Figure 2Step 23 is the core operation, ensuring the accuracy, security, and consistency of the operation. This operation simulation module simulates user behavior based on the WebDriver protocol, supports custom event sequences (such as key combinations and drag-and-drop operations), and controls the operation frame rate through requestAnimationFrame to avoid triggering webpage anti-scraping mechanisms, thereby improving the security and compatibility of the operation.

[0129] This module integrates an event-operation linkage mechanism. Through the MutationObserver interface and AJAX request interception technology (such as XMLHttpRequest listening), it achieves dynamic synchronization between operation execution sequence and webpage state, solving the technical problem of uncontrolled automated processes in asynchronous scenarios. It can monitor changes in webpage state in real time (including but not limited to page jumps, pop-ups, AJAX request completion, etc.) and feed back the monitoring results to the RPA controller to ensure that operation execution and webpage state remain synchronized.

[0130] Simultaneously, this module integrates a visual verification plugin, which verifies the visual correctness of operation results through screenshot comparison (such as using a pixel matching library), promptly identifying operation deviations and feeding them back to the RPA controller. After the operation is completed, it generates and returns the event handle corresponding to the operation event, feeding it back to the RPA controller. This allows the RPA controller to track the execution status of the operation event, listen for subsequent asynchronous events, and register the event handle with the event listening queue, ensuring the continuous execution of the automated process. Its functionality is fully compatible. Figure 2 Step 23 requires operation simulation and status monitoring.

[0131] (5) Fault-tolerant and adaptive module

[0132] The fault-tolerance and adaptive module is the core module for ensuring the stability of the RPA system. Its core function is to handle various anomalies during element location and operation simulation under the scheduling of the RPA controller, ensuring that the automated process is not interrupted. Figure 2 Step 24 is the core operation that enhances the system's anti-interference capability and stability. This module can automatically handle various abnormal scenarios such as changes in web page elements, operation execution failures, and network latency. Under the control of the RPA controller, it adopts a three-level fault tolerance strategy to achieve comprehensive anomaly adaptation. The fault tolerance process is fed back to the RPA controller in real time. Figure 2 The core requirements of fault tolerance processing are completely aligned.

[0133] like Figure 5As shown, this application embodiment also provides an electronic device 500, which can be a terminal or an electronic device. The electronic device 500 includes a processor 501, a memory 502, and a program or instructions stored in the memory 502 and executable on the processor 501. When the program or instructions are executed by the processor 501, they implement the above-mentioned functions. Figure 1 The various processes in the method embodiments can achieve the same technical effect. To avoid repetition, they will not be described again here.

[0134] This application embodiment also provides a readable storage medium storing a program or instructions that, when executed by a processor, implement the above-described functionality. Figure 1 The various processes of the method embodiments shown can achieve the same technical effect, and will not be described again here to avoid repetition.

[0135] The processor mentioned above is the processor in the terminal described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0136] The RPA web page automation processing method, device, system, and medium provided in this application are based on the core principle of obtaining the complete DOM tree of the target web page and performing dynamic parsing. By combining element semantic attributes, the target element and its identifier are located, user interaction behavior is simulated, and the web page status is monitored in real time. It is also equipped with an adaptive fault tolerance mechanism, supports modern Web technologies such as AJAX asynchronous loading, iframe nesting, and dynamic content loading. Overall, it has the advantages of high adaptability, accurate operation, efficient handling of complex scenarios, and strong robustness. It can effectively cope with dynamic changes in web page structure, reduce the maintenance cost of automation scripts, avoid coordinate offset or image recognition errors, significantly reduce the probability of automation process interruption, improve operation accuracy and automation success rate, adapt to modern complex web page scenarios, and solve the problem of positioning failure caused by dynamic changes in web page structure in traditional RPA.

[0137] The steps of the methods or algorithms described in this application can be implemented in hardware or by executing software instructions on a processor. The software instructions can consist of corresponding software modules, which can be stored in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, portable hard disk, read-only optical disk, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and storage medium can be housed in an ASIC. Alternatively, the ASIC can be housed in a core network interface device. Of course, the processor and storage medium can also exist as discrete components in the core network interface device.

[0138] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this application can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include computer storage media and communication media, wherein communication media include any medium that facilitates the transfer of a computer program from one place to another. Storage media can be any available medium accessible to a general-purpose or special-purpose computer.

[0139] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above description is only a specific embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of this application should be included within the scope of protection of this application.

[0140] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, embodiments of this application can take the form of entirely hardware embodiments, entirely software embodiments, or embodiments combining software and hardware aspects. Furthermore, embodiments of this application can take the form of computer program products implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0141] This application describes embodiments of methods, apparatus (systems), and computer program products according to embodiments of this application with reference to flowchart illustrations and / or block diagrams. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0142] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0143] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0144] Obviously, those skilled in the art can make various modifications and variations to the embodiments of this application without departing from the spirit and scope of this application. Therefore, if these modifications and variations to the embodiments of this application fall within the scope of the claims of this application and their equivalents, this application also intends to include these modifications and variations.

Claims

1. A processing method, characterized in that, include: Obtain the complete Document Object Model (DOM) tree of the target webpage; Based on the complete DOM tree, obtain the target element and its identifier; Based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time.

2. The method according to claim 1, characterized in that, The process of obtaining the complete DOM tree of the target webpage includes: Send a page loading command to the target webpage to drive the target webpage to complete page rendering, and obtain the complete DOM tree after rendering; The complete DOM tree contains the tag information, attribute parameters, and hierarchical relationships of all elements on the target webpage.

3. The method according to claim 1, characterized in that, Based on the complete DOM tree, obtain the target element and its identifier, including: Parse the element location expression according to the preset task requirements, filter and match target elements in the complete DOM tree, and obtain the identifier of the target element; The preset task requirements are used to represent the automated operation goals pre-configured by the user.

4. The method according to claim 1, characterized in that, Based on the target element and its identifier, simulate user interaction behavior, execute corresponding operation events, and monitor changes in the webpage state in real time, including: Based on the target element and its identifier, simulate user interaction behavior and execute corresponding operation events; Real-time monitoring of webpage status changes is achieved through an event-action linkage mechanism.

5. The method according to claim 1, characterized in that, The method further includes: If the target element is detected to be invalid or the webpage structure has changed during the process of obtaining the target element and / or simulating user interaction behavior, the complete DOM tree is re-parsed and / or the element positioning expression is adjusted.

6. The method according to claim 1, characterized in that, The method further includes: After simulating user interaction, an execution log is generated, which records the operation path, execution details and / or exception handling of each step.

7. A processing apparatus, characterized in that, include: The RPA controller is used to retrieve the complete DOM tree of the target webpage; The dynamic positioning module is used to obtain the target element and its identifier based on the complete DOM tree. The operation simulation module is used to simulate user interaction behavior based on the target element and its identifier, execute corresponding operation events, and monitor changes in the webpage state in real time.

8. A robotic process automation system, characterized in that, Includes the processing apparatus as described in claim 7.

9. An electronic device, characterized in that, It includes a processor, a memory, and a program or instructions stored in the memory and executable on the processor, wherein the program or instructions, when executed by the processor, implement the steps of the method as described in any one of claims 1 to 6.

10. A readable storage medium, characterized in that, The readable storage medium stores a program or instructions that, when executed by a processor, implement the steps of the method as described in any one of claims 1 to 6.