Methods, apparatus, systems and storage media for reproducing cross-terminal graphical user interface operation behavior
By extracting the response chain path of the view tree and the visual content fingerprint for comparison and verification during cross-terminal user interface operation behavior reproduction, the accuracy and security issues of cross-terminal operation behavior reproduction are solved, and efficient operation reproduction under different devices and system versions is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHEZHI HULIAN BEIJING SCI & TECH CO LTD
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies suffer from inaccurate positioning and insufficient security in reproducing cross-terminal user interface behavior. In particular, traditional methods cannot achieve accurate and secure operation reproduction when there are differences in screen size, resolution and operating system version of different devices.
By intercepting user interface operations on the host side, the system extracts the response chain path stack sequence of the target view in the system view tree, determines whether it is in a scrolling view container, generates the location information of the view hierarchy branch sequence and relative coordinate index matrix, extracts functional semantic identifiers and visual content fingerprints, generates topology reproduction instructions and sends them to the slave side for hierarchical matching and comparison verification, ensuring the accuracy and security of the operation behavior.
It achieves accuracy and security in cross-terminal user interface operation behavior, adapts to changes in different devices and system versions, and significantly improves the success rate of reproduction and cross-platform adaptability.
Smart Images

Figure CN122309381A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, system, and storage medium for reproducing the operation behavior of a cross-terminal graphical user interface. Background Technology
[0002] In scenarios such as mobile application testing, remote assistance, and multi-device control, it is often necessary to reproduce the user interface operations of one device (host) in real time to another or multiple devices (slave). For example, in mobile application automated testing, testers need to execute the same test cases on devices of different brands and models; in remote customer service assistance scenarios, customer service personnel need to guide users to operate on specific interface elements; and in multi-device control scenarios, operators need to control multiple devices simultaneously to perform the same task.
[0003] Due to differences in screen size, resolution, operating system version, and application version across different devices, traditional methods for reproducing operations based on absolute coordinates often fail to accurately locate the target view, especially in complex scenarios such as scrolling view containers and dynamically loaded content, resulting in a low success rate. Furthermore, existing technologies lack effective security verification mechanisms, which could lead to operations being incorrectly reproduced on non-target views, posing security risks.
[0004] In this context, how to achieve accurate, secure, and adaptive reproduction of user interface operation behavior across terminals has become a key focus in the field of mobile application development and testing. Summary of the Invention
[0005] The purpose of this invention is to provide a method, apparatus, system, and storage medium for reproducing the operation behavior of a cross-terminal graphical user interface, thereby solving the aforementioned problems existing in the prior art.
[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0007] In a first aspect, embodiments of the present invention provide a method for reproducing the operation behavior of a cross-terminal graphical user interface, comprising the following steps:
[0008] Intercept user interface operations on the host side and extract the response chain path stack sequence of the target view that triggered the operation in the system view tree;
[0009] Determine if the target view is within a scroll view container;
[0010] If the target view is within a scroll view container, a view hierarchy branch sequence containing the target view and location information of the relative coordinate index matrix are generated.
[0011] Extract the functional semantic identifiers and visual content fingerprints of the target view;
[0012] The response chain path stack sequence, location information, functional semantic identifier and visual content fingerprint are serialized and encapsulated according to the preset multi-level delimiter rules to generate the target topology reproduction instruction and send it to the slave end.
[0013] The slave device receives and parses the target topology reproduction command, and performs hierarchical matching in the system view tree on the slave device according to the response chain path stack sequence to lock the candidate view area;
[0014] If the target topology reproduction instruction contains location information, then the candidate coordinate points are locked in the scroll container within the candidate view area based on the relative coordinate index matrix.
[0015] Extract the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point, and compare and verify them with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction instruction.
[0016] If the comparison matches, the location is confirmed to be successful and the reproduction operation corresponding to the user interface operation behavior is executed;
[0017] If the comparison is inconsistent, the abnormal suspension mechanism will be triggered and the reproduction operation will be stopped.
[0018] In one possible implementation, the response chain path stack sequence of the target view that triggered the action is extracted in the system view tree, including:
[0019] Obtain the hierarchical information of the target view in the system event response chain;
[0020] Tracing back up along the hierarchy information, extract the navigation controller class name and view controller class name to which the current view belongs in sequence;
[0021] The response chain path stack sequence is generated by arranging the paths in the backtracking order.
[0022] In one possible implementation, location information is generated, including a sequence of view hierarchy branches containing the target view and a relative coordinate index matrix, including:
[0023] When it is determined that the target view is in a scroll view container, obtain the node index position sequence of the target view in the multi-level child nodes of the scroll view container, and use it as the view hierarchy branch sequence;
[0024] Get the current scroll offset of the scroll view container and calculate the scale quadrant value of the target view relative to the current visible screen area;
[0025] The node index position sequence is combined with the scale quadrant values to generate a relative coordinate index matrix.
[0026] In one possible implementation, the functional semantic identifier and visual content fingerprint of the target view are extracted, including:
[0027] Obtain at least one of the following: the text content currently rendered in the target view and the name of the bound response method, as a functional semantic identifier;
[0028] Capture the current rendered screen of the target view and extract local image resource identifiers or network image links as visual content fingerprints;
[0029] If the target view is scrolling, the border size information and the ratio information relative to the screen pixels of the target view are recorded synchronously, and the ratio information is incorporated into the visual content fingerprint as a supplementary dimension.
[0030] In one possible implementation, serialization and encapsulation are performed according to a preset multi-level delimiter rule, including:
[0031] Different information categories are divided into modules using the first type of delimiter;
[0032] The second type of delimiter is used to divide different attribute fields under the same major information category module;
[0033] The response chain path stack sequence, location information, functional semantic identifier, and visual content fingerprint are treated as independent string segments and concatenated into a single compact string instruction using the first type of delimiter and the second type of delimiter.
[0034] In one possible implementation, the method also includes:
[0035] Listen for foreground / background switching events, system pop-up events, page return events, and nested webpage loading events on the host side, and generate corresponding system-level event identifiers;
[0036] Encapsulate system-level event identifiers into specified fields of the target topology reproduction command;
[0037] The encapsulated target topology reproduction command is sent to the slave device to instruct the slave device to complete the corresponding system-level state switching synchronization before performing view positioning.
[0038] In one possible implementation, hierarchical matching is performed in the system view tree on the slave side based on the response chain path stack sequence to lock down candidate view regions, including:
[0039] According to the hierarchical order of the response chain path stack sequence, the navigation controller class name and view controller class name are matched level by level in the system view tree on the slave side;
[0040] If all class names match successfully, the view area managed by the lowest-level view controller is determined as the candidate view area.
[0041] If any first-level class name fails to match, the exception suspension mechanism will be triggered directly.
[0042] In one possible implementation, the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point are extracted, including:
[0043] Obtain at least one of the following: the real-time text content currently rendered in the view corresponding to the candidate coordinate point and the name of the bound real-time response method, as a semantic identifier for the real-time function.
[0044] Capture the current real-time rendered screen of the view corresponding to the candidate coordinate point, and extract the real-time local image resource identifier or real-time network image link as the real-time visual content fingerprint.
[0045] Secondly, embodiments of the present invention provide a device for reproducing the operation behavior of a cross-terminal graphical user interface, applied to a host device, comprising:
[0046] The path extraction module is used to intercept user interface operations on the host side and extract the response chain path stack sequence of the target view that triggered the operation in the system view tree.
[0047] The location generation module is used to determine whether the target view is within a scrollable view container; if it is, it generates location information including the view hierarchy branch sequence and the relative coordinate index matrix of the target view.
[0048] The feature fusion module is used to extract the functional semantic identifiers and visual content fingerprints of the target view;
[0049] The instruction encapsulation module is used to serialize and encapsulate the response chain path stack sequence, location information, functional semantic identifier and visual content fingerprint according to the preset multi-level delimiter rules, generate the target topology reproduction instruction and send it to the slave end.
[0050] The slave device receives and parses the target topology reproduction command. Based on the response chain path stack sequence, it performs hierarchical matching in the slave device's system view tree to lock the candidate view area. If the target topology reproduction command contains location information, it locks the candidate coordinate point in the scroll container within the candidate view area based on the relative coordinate index matrix. It extracts the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point and compares and verifies them with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction command. If the comparison matches, it confirms successful positioning and executes the reproduction operation corresponding to the user interface operation behavior. If the comparison does not match, it triggers an abnormal suspension mechanism and stops the reproduction operation.
[0051] Thirdly, embodiments of the present invention provide a multi-control system, including a host, a server, and at least one slave device:
[0052] The host is used to execute the method of the first aspect and generate the target topology reproduction instruction;
[0053] The server is used to establish long-term communication channels with the host and each slave device, and to distribute the target topology reproduction instructions generated by the host to each slave device;
[0054] The slave device receives the target topology reproduction command and performs hierarchical matching in the system view tree on the slave side according to the response chain path stack sequence to lock the candidate view area. If the target topology reproduction command contains location information, the slave device locks the candidate coordinate point in the scroll container within the candidate view area according to the relative coordinate index matrix. The slave device extracts the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point and compares and verifies them with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction command. If the comparison matches, the slave device confirms successful positioning and executes the reproduction operation corresponding to the user interface operation behavior. If the comparison does not match, the slave device triggers the abnormal suspension mechanism and stops the reproduction operation.
[0055] The beneficial effects of this invention are:
[0056] The technical solution of this invention can flexibly perform cross-terminal view positioning based on the response chain path of the system view tree. Through dual-dimensional comparison and verification of functional semantic identifiers and visual content fingerprints, it ensures the accuracy and security of operation behavior reproduction. When the view state of the slave device is inconsistent with that of the host device, the technical solution of this invention can detect and trigger an abnormal suspension mechanism in real time to prevent erroneous operations, thereby enabling this operation behavior reproduction method to adapt to changes in different devices, different system versions, and different application scenarios.
[0057] Furthermore, because the technical solution of this invention uses a relative coordinate index matrix to locate the target view within the scrolling view container, the set proportional quadrant value can adapt to changes in screen size and scroll offset. Related functional semantic identifiers and visual content fingerprints encompass multi-dimensional features such as text content, response methods, image resources, and border dimensions. Therefore, each reproduction operation achieves the current optimal solution while ensuring accurate view positioning. In this way, the reproduction of the entire operation behavior can be guaranteed to be accurate and secure, while possessing high cross-platform adaptability and scrolling state adaptability, significantly improving the reproduction success rate.
[0058] The above overview is for illustrative purposes only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the invention will become readily apparent from the accompanying drawings and the following detailed description. Attached Figure Description
[0059] Figure 1 The flowchart illustrates a method for reproducing the operation behavior of a cross-terminal graphical user interface provided in an embodiment of the present invention.
[0060] Figure 2 This diagram illustrates a device for reproducing the operation behavior of a cross-terminal graphical user interface provided in an embodiment of the present invention.
[0061] Figure 3 A schematic diagram of a multi-control system provided in an embodiment of the present invention is shown; and
[0062] Figure 4 A block diagram of an electronic device used to implement embodiments of the present invention is shown. Detailed Implementation
[0063] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0064] Figure 1 The flowchart illustrates a method 100 for reproducing cross-terminal graphical user interface operations according to an embodiment of the present invention. This method is applied to the process of reproducing user interface operations on a slave device from a host device, and may include the following steps:
[0065] Step S101: Intercept the user interface operation behavior on the host side and extract the response chain path stack sequence of the target view that triggered the operation behavior in the system view tree.
[0066] Step S102: Determine whether the target view is in a scroll view container; if it is in a scroll view container, generate a view hierarchy branch sequence containing the target view and location information of the relative coordinate index matrix.
[0067] Step S103: Extract the functional semantic identifier and visual content fingerprint of the target view.
[0068] Step S104: The response chain path stack sequence, the location information, the functional semantic identifier, and the visual content fingerprint are serialized and encapsulated according to the preset multi-level delimiter rules to generate a target topology reproduction instruction and send it to the slave device.
[0069] Step S105: The slave device receives and parses the target topology reproduction instruction, and performs hierarchical matching in the system view tree of the slave device according to the response chain path stack sequence to lock the candidate view area; if the instruction contains the location information, the candidate coordinate point is locked in the scroll container in the candidate view area according to the relative coordinate index matrix.
[0070] Step S106: Extract the real-time functional semantics and visual features of the view corresponding to the candidate coordinate point, and compare and verify them with the functional semantic identifier and visual content fingerprint carried in the instruction; if the comparison is consistent, confirm that the positioning is successful and execute the reproduction operation corresponding to the user interface operation behavior; if the comparison is inconsistent, trigger the abnormal suspension mechanism and stop the reproduction operation.
[0071] The operation behavior reproduction method provided in this embodiment of the invention can flexibly achieve cross-terminal view positioning based on the hierarchical structure of the system view tree, thereby guiding the slave device to reproduce the operation behavior of the host device. In this case, when the device model, screen size, and system version of the slave device are inconsistent with those of the host device, the operation behavior reproduction method provided in this embodiment of the invention can perform hierarchical matching in real time based on the response chain path stack sequence, dynamically adjust and lock the candidate view area, so that the reproduction method can adapt to changes in device differences. At this time, the accuracy and security of operation behavior reproduction can be ensured by guiding the reproduction operation based on the dual-dimensional comparison and verification of functional semantic identifiers and visual content fingerprints.
[0072] Furthermore, since the operation behavior reproduction method provided in this embodiment of the invention uses a relative coordinate index matrix to locate the target view in the scrolling view container, the set scale quadrant value can adapt to changes in screen size and scroll offset. Related functional semantic identifiers and visual content fingerprints cover multi-dimensional features such as text content, response methods, image resources, and border size. Therefore, each reproduction operation achieves the current optimal solution that is adaptive across terminals while ensuring the accuracy of view positioning. Under these circumstances, the entire operation behavior reproduction can be guaranteed to have high cross-platform adaptability and scroll state adaptability while ensuring accuracy and security, significantly improving the reproduction success rate.
[0073] It should be noted that the execution subject of the operation behavior reproduction method provided in this embodiment of the invention can be a hardware device with operation behavior reproduction function, such as a smartphone, tablet computer, personal computer, server, etc. The host device can be the device operated by the tester, and the slave device can be a batch of devices to be tested. In this embodiment of the invention, no specific limitation is made on the execution subject corresponding to the operation behavior reproduction method provided in this embodiment of the invention.
[0074] User interface (UI) actions refer to the interactive operations performed by a user on a graphical user interface (GUI). Common UI actions include clicking, swiping, long-pressing, dragging, and zooming. Common target views include UI elements such as buttons, text boxes, images, list items, switches, and sliders. This embodiment of the invention does not specifically limit the type and number of UI actions or target views.
[0075] The system view tree refers to the view hierarchy maintained by the operating system. In iOS, the system view tree consists of UIViewControllers and UIViews, forming a tree-like hierarchy, where UINavigationController manages the navigation stack of the view controllers. In Android, the system view tree has a similar hierarchy composed of Activities, Fragments, and Views, where Activities manage the lifecycle of the interface, and Fragments are used to implement modularity of the interface.
[0076] The response chain path stack sequence includes: the hierarchical information of the target view in the system event response chain, and the sequence formed by tracing back along the hierarchical information, where the navigation controller class name and view controller class name are arranged in the tracing order.
[0077] When setting the response chain path stack sequence, in addition to extracting the navigation controller class name and view controller class name, the subclass names of the navigation controller and view controller can also be further extracted based on the system architecture. That is to say, the content of the response chain path stack sequence is not specifically limited in this embodiment of the invention. Correspondingly, when setting functional semantic identifiers, in addition to obtaining the currently rendered text content of the target view and the name of the bound response method, the view's tag identifier, accessibility identifier, etc., can also be further obtained based on the application type.
[0078] In this embodiment of the invention, the structural data of the target view includes, but is not limited to: view type, view size, view position, view background color, view transparency, parent view relationship of the view, list of child views of the view, constraint layout information of the view, etc.
[0079] Device status includes, but is not limited to: device model, operating system version, screen resolution, screen pixel density, current screen orientation, device language settings, device region settings, and device accessibility settings.
[0080] The view environment includes, but is not limited to: the current page title, the page loading status, the network connection status, the system theme mode (light / dark), and the system font size settings.
[0081] In this embodiment of the invention, multiple preset events can be pre-set. Then, it is detected whether at least one of the multiple preset events has occurred, and if at least one of the multiple preset events is detected to have occurred, it is determined that the operation behavior needs to be reproduced.
[0082] Various configuration events include receiving an operation reproduction start command, changes in the device status on the host exceeding the set device status change threshold, and changes in the view environment on the host exceeding the set view environment change threshold.
[0083] When initial operation behavior reproduction is required, an operation reproduction start command can be generated in response to the operation of the relevant personnel, thereby triggering the initial operation behavior reproduction process. In addition, when equipment switching occurs or after the reproduction of each critical operation is completed, an operation reproduction start command can also be generated in response to the operation of the relevant personnel, thereby triggering the reconfiguration process for operation behavior reproduction.
[0084] In this embodiment of the invention, reproducing the operation behavior includes performing an initial reproduction of the operation behavior and reconfiguring the operation behavior for reproduction.
[0085] In this embodiment of the invention, a change in the device state on the host side that conforms to a set device state change can refer to a change in screen resolution from the original standard resolution to the current high resolution, a change in screen orientation from the original portrait to the current landscape, or a change in the device language setting from the original Chinese to the current English. These changes indicate that the device state has changed significantly and may affect the accuracy of view positioning.
[0086] In this embodiment of the invention, changes in the view environment on the host side that exceed the set view environment changes can refer to the page title changing from the original homepage to the current details page, the network connection status changing from the original Wi-Fi connection to the current mobile data connection, or the system theme mode changing from the original light mode to the current dark mode.
[0087] In one possible implementation, when performing cross-terminal view positioning using the response chain path stack sequence for the target view's structural data, the host's current device status data, and the current view environment data, at least one candidate view region that meets the set hierarchical matching requirements can be generated based on the structural data, current device status data, current view environment data, and constraints. The hierarchical matching score corresponding to each candidate view region can then be determined.
[0088] For each candidate view area, if the target view is in the scroll view container, based on the structural data, current device status data, current view environment data and constraints, at least one candidate coordinate point that meets the set scroll positioning requirements is generated, and the scroll positioning score corresponding to each candidate coordinate point is determined.
[0089] For each candidate coordinate point, extract the real-time functional semantic identifier and real-time visual content fingerprint that meet the set feature comparison requirements, and determine the feature comparison score corresponding to each real-time feature.
[0090] At least one candidate reproduction operation is generated, each candidate reproduction operation including a specific candidate view region, a specific candidate coordinate point corresponding to the specific candidate view region, and a specific real-time feature corresponding to the specific candidate coordinate point.
[0091] Based on the hierarchical matching score corresponding to the specific candidate view area, the scroll positioning score corresponding to the specific candidate coordinate point, and the feature comparison score corresponding to the specific real-time feature, the operation score corresponding to each candidate reproduction operation is determined, and based on the operation score, the target preferred reproduction operation is selected from the at least one candidate reproduction operation.
[0092] In one example, the target view is a list item, and one of the candidate view areas is: based on the response chain path stack sequence.
[0093] The candidate region matched by "MainNavigationController|HomeViewController|UITableView" has another candidate view region based on the response chain path stack sequence.
[0094] Candidate areas matched by "MainNavigationController|HomeViewController|UICollectionView".
[0095] In one possible implementation, the hierarchical matching score for each candidate view region is determined as follows: First, the class name matching score and hierarchical depth score for each candidate view region are determined. Then, the pre-set weights for the class name matching score and hierarchical depth score are obtained. Finally, the hierarchical matching score is calculated based on the class name matching score, hierarchical depth score, class name matching score weight, and hierarchical depth score weight.
[0096] In this embodiment of the invention, the class name matching score is obtained by comparing the similarity between the class name extracted from the host and the class name actually matched on the slave. A score of 1 is given for a complete match, 0.5 is given for a partial match (such as a parent class match), and 0 is given for a no-match. The hierarchy depth score is obtained by calculating the difference in view hierarchy depth between the host and slave; the smaller the difference, the higher the score.
[0097] In this embodiment of the invention, the rolling positioning score is obtained by calculating the similarity between the proportional quadrant value calculated by the host and the proportional quadrant value calculated by the slave, while also considering the difference in rolling offset.
[0098] In one example, there are two candidate view regions: candidate view region A and candidate view region B. For candidate view region A, two candidate coordinate points are generated: candidate coordinate point A1 and candidate coordinate point A2. For candidate view region B, two candidate coordinate points are generated: candidate coordinate point B1 and candidate coordinate point B2.
[0099] In one possible implementation, the rolling positioning score for each candidate coordinate point is determined as follows: First, the proportional quadrant similarity score and the rolling offset difference score corresponding to each candidate coordinate point are determined. Then, the pre-set weights for the proportional quadrant similarity score and the rolling offset difference score are obtained. Finally, the rolling positioning score is calculated based on the proportional quadrant similarity score, the rolling offset difference score, the proportional quadrant similarity score weights, and the rolling offset difference score weights.
[0100] The proportional quadrant similarity score and the rolling offset difference score are obtained by comparing the proportional quadrant values and rolling offsets of the master and slave sides, respectively.
[0101] In one example, there are four candidate coordinate points: A1, A2, B1, and B2. Two real-time features are extracted from candidate coordinate point A1: a real-time functional semantic identifier (A1) and a real-time visual content fingerprint (A1). Similarly, two real-time features are extracted from candidate coordinate point A2: a real-time functional semantic identifier (A2) and a real-time visual content fingerprint (A2). The same applies to candidate coordinate point B1 and B2.
[0102] In this scenario, four candidate reproduction operations are generated. Candidate reproduction operation 1 includes: candidate view region A, candidate coordinate point A1, real-time functional semantic identifier A1, and real-time visual content fingerprint A1. Candidate reproduction operation 2 includes: candidate view region A, candidate coordinate point A2, real-time functional semantic identifier A2, and real-time visual content fingerprint A2. Candidate reproduction operation 3 includes: candidate view region B, candidate coordinate point B1, real-time functional semantic identifier B1, and real-time visual content fingerprint B1. Candidate reproduction operation 4 includes: candidate view region B, candidate coordinate point B2, real-time functional semantic identifier B2, and real-time visual content fingerprint B2.
[0103] In one possible implementation, the feature comparison score corresponding to each real-time feature is determined as follows: First, a set feature comparison standard is obtained, including a functional semantic similarity threshold and a visual fingerprint matching degree threshold. Then, for each real-time functional semantic identifier and real-time visual content fingerprint, the feature comparison score is determined according to the feature comparison standard. If the functional semantic similarity is greater than the first threshold (e.g., 0.8) and the visual fingerprint matching degree is greater than the second threshold (e.g., 0.9), the feature comparison score is 1; if only one of them meets the threshold, the feature comparison score is 0.5; if neither of them meets the threshold, the feature comparison score is 0, and an abnormal suspension mechanism is triggered.
[0104] In one possible implementation, when determining the operation score for each candidate reproduction operation based on the hierarchical matching score corresponding to a specific candidate view area, the scrolling positioning score corresponding to a specific candidate coordinate point, and the feature comparison score corresponding to a specific real-time feature, pre-set hierarchical matching score weights, scrolling positioning score weights, and feature comparison score weights can be obtained first. Then, the operation score is calculated based on the hierarchical matching score corresponding to the specific candidate view area, the scrolling positioning score corresponding to the specific candidate coordinate point, the feature comparison score corresponding to the specific real-time feature, the hierarchical matching score weights, the scrolling positioning score weights, and the feature comparison score weights.
[0105] In one possible implementation, when selecting the target preferred reproducible operation from at least one candidate reproducible operation based on the operation score, the candidate reproducible operation with the highest operation score can be selected as the target preferred reproducible operation. If the highest operation score is lower than a set minimum threshold (e.g., 0.6), an abnormal suspension mechanism is triggered, the reproducible operation is stopped, and an abnormal information is reported.
[0106] In this embodiment of the invention, for the structural data of the target view, the current device status data of the host, and the current view environment data, the response chain path stack sequence, the relative coordinate index matrix, the functional semantic identifier, and the visual content fingerprint are used to perform cross-terminal view positioning and operation reproduction. In addition to being used to locate candidate view areas, candidate coordinate points, and perform feature comparison in sequence, feature comparison positioning can also be performed directly, or only the response chain path stack sequence can be used for hierarchical matching positioning.
[0107] Corresponding to the operation behavior reproduction method provided in the embodiments of the present invention, the embodiments of the present invention also provide an operation behavior reproduction device, which is applied to the process of reproducing user interface operation behavior of a slave device from a host device. Figure 2 The diagram shown is a structural block diagram of an operation behavior reproduction device 200 according to an embodiment of the present invention. The device 200 may include:
[0108] The path extraction module 201 is used to intercept user interface operation behavior on the host side and extract the response chain path stack sequence of the target view that triggered the operation behavior in the system view tree.
[0109] The location generation module 202 is used to determine whether the target view is in a scrolling view container; if it is in a scrolling view container, it generates location information containing the view hierarchy branch sequence and the relative coordinate index matrix of the target view.
[0110] The feature fusion module 203 is used to extract the functional semantic identifier and visual content fingerprint of the target view.
[0111] The instruction encapsulation module 204 is used to serialize and encapsulate the response chain path stack sequence, the location information, the functional semantic identifier and the visual content fingerprint according to the preset multi-level delimiter rules, generate the target topology reproduction instruction and send it to the slave terminal, so that the slave terminal can perform parsing, lock the candidate coordinate point, perform real-time comparison and verification, and perform the reproduction operation when the comparison is consistent, and trigger the abnormal suspension mechanism when the comparison is inconsistent.
[0112] In one possible implementation, the device further includes:
[0113] The event listening module 205 is used to listen for foreground / background switching events, system pop-up events, page return events, and nested webpage loading events on the host side, and generate corresponding system-level event identifiers; the system-level event identifiers are encapsulated in the specified fields of the target topology reproduction instruction and sent to the slave side to instruct the slave side to complete the corresponding system-level state switching synchronization before performing view positioning.
[0114] In one possible implementation, the path extraction module 201 includes:
[0115] The operation interception submodule is used to intercept user interface operation behaviors through system-provided accessibility APIs or code injection methods.
[0116] The hierarchical analysis submodule is used to traverse the response chain of the target view and extract the navigation controller class name and view controller class name.
[0117] The sequence generation submodule is used to generate a response chain path stack sequence arranged in backtracking order.
[0118] In one possible implementation, the location generation module 202 includes:
[0119] The container detection submodule is used to traverse the parent view chain of the target view and identify the type of the scroll view container.
[0120] The index calculation submodule is used to calculate the sequence of node index positions of the target view within the scroll view container.
[0121] The scaling calculation submodule is used to calculate the scaling quadrant value based on the scroll offset and the visible area of the screen.
[0122] The matrix generation submodule is used to combine the node index position sequence and the scale quadrant value to generate a relative coordinate index matrix.
[0123] In one possible implementation, the feature fusion module 203 includes:
[0124] The semantic extraction submodule is used to extract the rendered text content and response method name of the target view.
[0125] The visual extraction submodule is used to capture the rendered screen of the target view and extract image resource identifiers or network image links.
[0126] The dynamic compensation submodule is used to record border size and proportion information during scrolling to generate an enhanced visual fingerprint.
[0127] In one possible implementation, the instruction encapsulation module 204 includes:
[0128] The delimiter management submodule is used to manage the definition and configuration of the first type of delimiter and the second type of delimiter.
[0129] The serialization submodule is used to concatenate various types of information into a compact string according to delimiter rules.
[0130] The transmission submodule is used to send the target topology reproduction command to the slave device via network protocol.
[0131] The functions of each module in each device of the embodiments of the present invention can be found in the corresponding description in the above method, and have corresponding beneficial effects, which will not be repeated here.
[0132] Corresponding to the operation behavior reproduction method provided in the embodiments of the present invention, the embodiments of the present invention also provide a multi-control system, such as... Figure 3 The diagram shown is a structural block diagram of a multi-control system 300 according to an embodiment of the present invention. The system 300 may include: a host 301, a server 302, and a slave 303.
[0133] Specifically, host 301 is used to execute the method provided in any embodiment of the present invention to generate a target topology reproduction instruction. Host 301 includes a user interface layer, an operation capture layer, an instruction generation layer, and a communication layer.
[0134] Server 302 is used to establish long-term communication channels with host 301 and each slave 303, and to distribute the target topology reproduction instructions generated by host 301 to each slave 303. Server 302 includes a connection management module, an instruction routing module, and a status synchronization module.
[0135] Slave unit 303 is used to receive target topology reproduction instructions and perform view positioning, real-time comparison and verification, and reproduction operations based on the target topology reproduction instructions. Slave unit 303 includes a communication layer, an instruction parsing layer, a view positioning layer, a comparison and verification layer, an operation execution layer, and an exception handling layer.
[0136] In one possible implementation, the operation capture layer of host 301 includes the aforementioned path extraction module 201, location generation module 202, and feature fusion module 203.
[0137] In one possible implementation, the instruction generation layer of host 301 includes the aforementioned instruction encapsulation module 204 and event listening module 205.
[0138] In one possible implementation, the view positioning layer of slave 303 is used to perform hierarchical matching based on the response chain path stack sequence to lock the candidate view area; if the instruction contains location information, the candidate coordinate point is locked in the scroll container within the candidate view area based on the relative coordinate index matrix.
[0139] In one possible implementation, the comparison and verification layer of slave device 303 is used to extract real-time functional semantic identifiers and real-time visual content fingerprints, and compare and verify them with the features carried in the instructions.
[0140] In one possible implementation, the operation execution layer of slave 303 is used to perform a reproduction operation when the comparison is consistent, and to trigger an exception suspension mechanism when the comparison is inconsistent.
[0141] The functions of each module in each system of the embodiments of the present invention can be found in the corresponding descriptions in the above methods, and they have corresponding beneficial effects, which will not be repeated here.
[0142] Figure 4 This is a block diagram of an electronic device used to implement embodiments of the present invention. For example... Figure 4 As shown, the electronic device includes a memory 401 and a processor 402. The memory 401 stores a computer program that can run on the processor 402. When the processor 402 executes the computer program, it implements the method described in the above embodiments. The number of memories 401 and processors 402 can be one or more.
[0143] The electronic device also includes:
[0144] Communication interface 403 is used to communicate with external devices and perform data exchange and transmission.
[0145] If the memory 401, processor 402, and communication interface 403 are implemented independently, they can be interconnected via a bus to communicate with each other. This bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 4 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0146] Optionally, in a specific implementation, if the memory 401, processor 402, and communication interface 403 are integrated on a single chip, then the memory 401, processor 402, and communication interface 403 can communicate with each other through an internal interface.
[0147] This invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method provided in this invention.
[0148] This invention also provides a chip, which includes a processor for calling and executing instructions stored in a memory, causing a communication device on which the chip is installed to perform the method provided in this invention.
[0149] This invention also provides a chip, including: an input interface, an output interface, a processor, and a memory. The input interface, output interface, processor, and memory are connected through an internal connection path. The processor is used to execute code in the memory. When the code is executed, the processor is used to execute the method provided in the application embodiment.
[0150] It should be understood that the aforementioned processor can be a Central Processing Unit (CPU), or other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. General-purpose processors can be microprocessors or any conventional processor. It is worth noting that the processor can be a processor supporting Advanced Reduced Instruction Set Machines (ARM) architecture.
[0151] Further, optionally, the aforementioned memory may include read-only memory and random access memory, and may also include non-volatile random access memory. The memory may be volatile or non-volatile, or may include both. Non-volatile memory may include read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may include random access memory (RAM), which serves as an external cache. Many forms of RAM are available by way of example, but not limitation. Examples include Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
[0152] In the above embodiments, implementation can be achieved, in whole or in part, by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the present invention is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another.
[0153] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of those different embodiments or examples.
[0154] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0155] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process. Furthermore, the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved.
[0156] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).
[0157] It should be understood that various parts of the present invention can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware, the program being stored in a computer-readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiments.
[0158] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. This storage medium can be a read-only memory, a disk, or an optical disk, etc.
[0159] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A method for reproducing the operation behavior of a cross-terminal graphical user interface, characterized in that, Includes the following steps: Intercept user interface operation behaviors on the host side and extract the response chain path stack sequence of the target view that triggered the operation behavior in the system view tree; Determine whether the target view is within a scroll view container; If the target view is in a scroll view container, then a view hierarchy branch sequence containing the target view and location information of the relative coordinate index matrix are generated. Extract the functional semantic identifiers and visual content fingerprints of the target view; The response chain path stack sequence, the location information, the functional semantic identifier and the visual content fingerprint are serialized and encapsulated according to the preset multi-level delimiter rules to generate a target topology reproduction instruction and send it to the slave device. The slave device receives and parses the target topology reproduction instruction, and performs hierarchical matching in the system view tree of the slave device according to the response chain path stack sequence to lock the candidate view area; If the target topology reproduction instruction contains the location information, then the candidate coordinate point is locked in the scroll container within the candidate view area according to the relative coordinate index matrix; Extract the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point, and compare and verify them with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction instruction; If the comparison matches, the location is confirmed to be successful and the reproduction operation corresponding to the user interface operation behavior is executed; If the comparison is inconsistent, the abnormal suspension mechanism is triggered and the reproduction operation is terminated.
2. The method according to claim 1, characterized in that, The extraction of the response chain path stack sequence of the target view that triggered the operation in the system view tree includes: Obtain the hierarchical information of the target view in the system event response chain; Tracing back up along the hierarchy information, the navigation controller class name and view controller class name to which the current view belongs are extracted sequentially; The response chain path stack sequence is generated by arranging the paths in the backtracking order.
3. The method according to claim 1, characterized in that, The generation of location information, including the view hierarchy branch sequence and relative coordinate index matrix of the target view, includes: When it is determined that the target view is in a scrolling view container, the node index position sequence of the target view in the multi-level child nodes of the scrolling view container is obtained as the view hierarchy branch sequence; Obtain the current scroll offset of the scroll view container and calculate the scale quadrant value of the target view relative to the current visible screen area; The relative coordinate index matrix is generated by combining the node index position sequence with the proportional quadrant value.
4. The method according to claim 1, characterized in that, The extraction of the functional semantic identifier and visual content fingerprint of the target view includes: Obtain at least one of the currently rendered text content of the target view and the name of the bound response method as the semantic identifier of the function; Capture the current rendered screen of the target view and extract the local image resource identifier or network image link as the visual content fingerprint; If the target view is in a scrolling state, the border size information and the ratio information relative to the screen pixels of the target view are recorded synchronously, and the ratio information is incorporated into the visual content fingerprint as a supplementary dimension.
5. The method according to claim 1, characterized in that, The serialization and encapsulation according to the preset multi-level delimiter rules includes: Different information categories are divided into modules using the first type of delimiter; The second type of delimiter is used to divide different attribute fields under the same major information category module; The response chain path stack sequence, the location information, the functional semantic identifier, and the visual content fingerprint are sequentially treated as independent string segments, and concatenated into a single compact string instruction using the first type of separator and the second type of separator.
6. The method according to claim 1, characterized in that, The method further includes: Listen for foreground / background switching events, system pop-up events, page return events, and nested webpage loading events on the host side, and generate corresponding system-level event identifiers; The system-level event identifier is encapsulated into a specified field of the target topology reproduction instruction; The encapsulated target topology reproduction command is sent to the slave device to instruct the slave device to complete the corresponding system-level state switching synchronization before performing view positioning.
7. The method according to any one of claims 1 to 6, characterized in that, The step of performing hierarchical matching in the slave-side system view tree based on the response chain path stack sequence to lock down candidate view areas includes: According to the hierarchical order of the response chain path stack sequence, the navigation controller class name and view controller class name are matched level by level in the system view tree on the slave side; If all class names match successfully, the view area managed by the lowest-level view controller is determined as the candidate view area. If any first-level class name fails to match, the aforementioned exception suspension mechanism will be triggered directly.
8. The method according to any one of claims 1 to 6, characterized in that, The extraction of the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point includes: Obtain at least one of the real-time text content currently rendered in the view corresponding to the candidate coordinate point and the name of the bound real-time response method, as the semantic identifier of the real-time function; Capture the current real-time rendered screen of the view corresponding to the candidate coordinate point, and extract the real-time local image resource identifier or real-time network image link as the real-time visual content fingerprint.
9. A device for reproducing the operation behavior of a cross-terminal graphical user interface, applied to a host computer, characterized in that, include: The path extraction module is used to intercept user interface operation behaviors on the host side and extract the response chain path stack sequence of the target view that triggered the operation behavior in the system view tree. The location generation module is used to determine whether the target view is within a scroll view container; If it is in a scrolling view container, a view hierarchy branch sequence containing the target view and location information of the relative coordinate index matrix are generated; The feature fusion module is used to extract the functional semantic identifiers and visual content fingerprints of the target view; The instruction encapsulation module is used to serialize and encapsulate the response chain path stack sequence, the location information, the functional semantic identifier and the visual content fingerprint according to the preset multi-level delimiter rules, generate the target topology reproduction instruction and send it to the slave end. The slave device is used to receive and parse the target topology reproduction instruction, and perform hierarchical matching in the system view tree of the slave device according to the response chain path stack sequence to lock the candidate view area; if the target topology reproduction instruction contains the location information, then the candidate coordinate point is locked in the scroll container in the candidate view area according to the relative coordinate index matrix; the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point are extracted and compared with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction instruction; if the comparison is consistent, the positioning is confirmed to be successful and the reproduction operation corresponding to the user interface operation behavior is executed; If the comparison is inconsistent, the abnormal suspension mechanism is triggered and the reproduction operation is terminated.
10. A multi-control system, characterized in that, Includes a host, a server, and at least one slave device: The host is configured to execute the method as described in any one of claims 1 to 6 to generate the target topology reproduction instruction; The server is used to establish a long-term communication channel with the host and each of the slave devices, and to distribute the target topology reproduction command generated by the host to each of the slave devices. The slave device is used to receive the target topology reproduction instruction, and perform hierarchical matching in the system view tree on the slave device side according to the response chain path stack sequence to lock the candidate view area; if the target topology reproduction instruction contains the location information, then lock the candidate coordinate point in the scroll container in the candidate view area according to the relative coordinate index matrix; extract the real-time functional semantic identifier and real-time visual content fingerprint of the view corresponding to the candidate coordinate point, and compare and verify them with the functional semantic identifier and visual content fingerprint carried in the target topology reproduction instruction; if the comparison is consistent, then the positioning is confirmed to be successful and the reproduction operation corresponding to the user interface operation behavior is executed; If the comparison is inconsistent, the abnormal suspension mechanism is triggered and the reproduction operation is terminated.