A mobile terminal automatic testing method and device

By generating and running automated test scripts using a large language model, combined with pre-built knowledge base constraints and automatically generated patch scripts, the problem of non-standard generation of automated test scripts for mobile devices is solved, achieving efficient and stable test iteration and maintenance.

CN122309384APending Publication Date: 2026-06-30BEIJING SOHU NEW MEDIA INFORMATION TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING SOHU NEW MEDIA INFORMATION TECH
Filing Date
2026-06-03
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing methods for generating automated test scripts for mobile devices suffer from high labor costs, non-standardized script generation, high execution error rates, and long repair cycles, making it difficult to meet the needs for efficient, stable, and automated test iterations.

Method used

Automated test scripts are generated and run using a large language model. Combined with pre-built knowledge base constraints, the scripts are ensured to conform to the project's predefined page encapsulation methods and underlying operation interfaces. Patch scripts are automatically generated when tests fail, and the process continues until the tests pass, at which point manual review and updates are performed.

Benefits of technology

It improves the standardization and maintainability of script generation, shortens the development and repair cycle, adapts to the rapid updates and iterations of mobile applications, enhances the stability and executability of tests, and reduces maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309384A_ABST
    Figure CN122309384A_ABST
Patent Text Reader

Abstract

This application discloses a mobile terminal automated testing method and apparatus, relating to the field of automated testing. It retrieves test knowledge related to test requirements from a pre-built knowledge base. Based on the test requirements and test knowledge, it generates automated test scripts executable at the mobile terminal automation driver layer using a large language model. The automated test scripts are then run to execute the tests. If a test fails, a patch script is generated based on the failure evidence package collected during script execution. This patch script is then used as a temporary script to re-execute the test until the test termination condition is met. If the test passes when the termination condition is met, the passed patch script is compared with the automated test script, and the comparison result is submitted for manual review. If the review is successful, the passed patch script is updated in the knowledge base. This application achieves intelligent script generation and automatic patch repair while ensuring script standardization and executability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of automated testing technology, and in particular to a mobile terminal automated testing method and apparatus. Background Technology

[0002] With the rapid development of the mobile internet, the types of mobile applications are increasing, and their business logic and interactive functions are becoming more complex, which places higher demands on the efficiency, stability, and scalability of mobile automated testing.

[0003] Currently, the generation of automated test scripts for mobile devices mainly falls into two categories: manual writing and automatic generation based on large language models. Manually writing test scripts requires significant manpower and time investment, resulting in long development cycles, poor reusability, and difficulty in adapting to rapidly iterating project requirements. On the other hand, generating test scripts directly from large language models based on testing requirements is limited by the model's own generation capabilities and project adaptability. The resulting automated test scripts often suffer from execution anomalies and high failure rates. Furthermore, the generated scripts typically do not adhere to the project's established page encapsulation rules and underlying interface standards, leading to poor universality and maintainability, making large-scale deployment in real-world projects difficult.

[0004] Furthermore, when test scripts fail to execute, existing solutions typically rely on manual troubleshooting and writing patch scripts, resulting in high debugging costs and long repair cycles. This further reduces the overall efficiency of automated testing and fails to meet the requirements for efficient, stable, and automated test iterations.

[0005] Therefore, how to achieve intelligent generation of test scripts and automatic patching in failure scenarios while ensuring the standardization and executability of test scripts has become an urgent technical problem to be solved in the field of mobile automated testing. Summary of the Invention

[0006] In view of the above problems, this application provides a mobile automated testing method and apparatus to achieve the goal of automatically generating and repairing mobile automated test scripts using a large language model, while ensuring the standardization and executability of test scripts. The specific solution is as follows:

[0007] The first aspect of this application provides a mobile automated testing method, including:

[0008] Obtain test requirement information, retrieve test knowledge related to the test requirement information from a pre-built knowledge base, and the test knowledge is used at least to constrain the large language model to generate scripts using a project-predefined unified page encapsulation method and standardized underlying operation interface;

[0009] Based on the test requirements information and the test knowledge, an automated test script that can be executed in the mobile terminal automation driver layer is generated through the large language model, and the automated test script is run to perform the test;

[0010] If the test fails, a patch script corresponding to the automated test script is generated based on the failure evidence package containing the failure step trajectory collected during the script execution. The patch script is then used as a temporary script to re-execute the test until the test termination condition is met.

[0011] If the test passes when the test termination condition is met, the patch script that passes the test is compared with the automated test script, and the comparison result is submitted for manual review.

[0012] If the review is approved, the patch script that passed the test will be updated to the knowledge base.

[0013] In one possible implementation, running the automated test script to execute the test includes:

[0014] The automated test script is run, and a screenshot of the interface during the script's execution is taken. In the screenshot, the target element to be located is located using multimodal recognition to obtain the target pixel coordinates. Based on a preset scaling factor, device resolution, and device pixel density, the target pixel coordinates are converted into the screen coordinates of the current test device. Under preset waiting conditions, the target action corresponding to the target element is executed according to the screen coordinates.

[0015] In one possible implementation, the step of performing multimodal recognition and localization on the target element to be located in the screenshot interface to obtain the target pixel coordinates includes:

[0016] Obtain the semantic tag of the target element, and determine the positioning strategy corresponding to the target element based on the semantic tag. The positioning strategy includes at least one of the target detection algorithm and the optical character recognition algorithm.

[0017] The target element is identified and located in the screenshot interface according to the positioning strategy described above, and the identification and positioning results are obtained.

[0018] If the recognition and positioning result includes multiple positioning boxes and the position coordinates corresponding to the multiple positioning boxes, then the target algorithm is used to perform deduplication and box merging processing on the multiple positioning boxes to obtain at least one remaining positioning box. If the positioning box is a detection box obtained by the target detection algorithm, then the target algorithm is a non-maximum suppression algorithm. If the positioning box is a text box obtained by the optical character recognition algorithm, then the target algorithm is a preset box merging rule.

[0019] Target positioning boxes are selected from the at least one positioning box, and the target pixel coordinates are obtained based on the position coordinates corresponding to the target positioning boxes.

[0020] In one possible implementation, the step of identifying and locating the target element in the screenshot interface according to the positioning strategy to obtain the identification and positioning result includes:

[0021] Obtain positioning assistance information, which is used to assist in locating the target element;

[0022] Based on the positioning assistance information, determine the region of interest containing the target element from the screenshot interface;

[0023] The target element is identified and located within the region of interest to obtain the identification and location result.

[0024] One possible implementation also includes:

[0025] If the preset self-healing rollback conditions are met, target remediation measures are adopted to re-identify and locate the target element. The target remediation measures include at least one of the following measures: changing the positioning strategy, enabling the control tree positioning strategy, and enabling the anchor point positioning strategy. The anchor point positioning strategy refers to first locating a stable anchor point element that has a known positional relationship with the target element, and then locating the target element based on the stable anchor point element. The self-healing rollback conditions include: the identification and positioning result does not include the positioning box, the confidence of the target positioning box is lower than the confidence threshold, and the target pixel coordinates cause no action result to be generated after the target action is executed.

[0026] In one possible implementation, filtering the target positioning box from the at least one positioning box includes:

[0027] Each element defined by the at least one positioning box is taken as a candidate element, and a scoring feature vector corresponding to the candidate element is constructed. The scoring feature vector includes at least one of the following dimensions: text similarity, category consistency, relative distance error, size ratio error and visual similarity between the candidate element and the target element.

[0028] Based on the scoring feature vector corresponding to the candidate element, a comprehensive score is determined for the candidate element, and the comprehensive score represents the comprehensive similarity between the candidate element and the target element.

[0029] From at least one candidate element, select the candidate element with the highest comprehensive score. If the comprehensive score of the selected candidate element is greater than a preset score threshold, then the positioning box corresponding to the selected candidate element is taken as the target positioning box.

[0030] One possible implementation also includes:

[0031] The automated test script and the patch script are respectively used as target scripts;

[0032] Before the target script is run, at least one of the following checks is performed on the target script: syntax check, static check, dependency check, and dry run check. The syntax check is used to verify whether the script conforms to the syntax rules of the programming language. The static check is used to analyze the abstract syntax tree and code structure without running the script to check for pre-existing potential problems in the script. The dependency check is used to check whether the external resources required for the script to run are complete and whether the versions are compatible. The dry run check is used to simulate the running of the script without actually performing any operations to verify the logical correctness of the script and the feasibility of the execution path.

[0033] In one possible implementation, generating the automated test script executable in the mobile automation driver layer based on the test requirement information and the test knowledge using the large language model includes:

[0034] Obtain a pre-built script generation prompt instruction template, which includes at least a requirement slot and an external knowledge slot. The script generation prompt instruction template is used to instruct the large language model to generate an automated test script that conforms to the requirement content in the requirement slot, based on the knowledge content in the external knowledge slot.

[0035] The test requirement information is filled into the requirement slot, and the test knowledge is filled into the external knowledge slot to obtain the script generation prompt instruction;

[0036] The script generates prompts which are then input into the large language model to obtain the automated test script.

[0037] One possible implementation also includes:

[0038] Based on the failure evidence package, attribution analysis is performed to obtain the failure type, and the failure count corresponding to the failure type is updated; so as to obtain the failure count corresponding to each failure type on the current test device.

[0039] If the number of failures corresponding to the target failure type reaches a preset failure threshold, the current test device is replaced and / or a degradation process is triggered. The degradation process includes: downgrading from full-map positioning to region of interest positioning during the multimodal recognition and positioning process, and / or directly replacing the multimodal recognition and positioning process with an anchor point positioning strategy.

[0040] A second aspect of this application provides a mobile terminal automated testing device, comprising:

[0041] The data acquisition unit is used to acquire test requirement information and retrieve test knowledge related to the test requirement information from a pre-built knowledge base. The test knowledge is at least used to constrain the large language model to generate scripts using a project-predefined unified page encapsulation method and standardized underlying operation interface.

[0042] The script testing unit is used to generate automated test scripts that can be executed in the mobile terminal automation driver layer based on the test requirement information and the test knowledge through the large language model, and to run the automated test scripts to perform tests.

[0043] The script patching unit is used to generate a patch script corresponding to the automated test script based on the failure evidence package containing the failure step trajectory collected during the script execution if the test fails, and to re-execute the test as a temporary script until the test end condition is met.

[0044] The script comparison unit is used to compare the patch script that passed the test with the automated test script if the test passes when the test end condition is met, and submit the comparison result for manual review.

[0045] The knowledge base update unit is used to update the patch script that passed the test into the knowledge base if the review is approved.

[0046] A fourth aspect of this application provides an electronic device, including at least one processor and a memory connected to the processor, wherein:

[0047] The memory is used to store computer programs;

[0048] The processor is used to execute the computer program so that the electronic device can implement the mobile terminal automated testing method of the first aspect or any implementation thereof.

[0049] The fifth aspect of this application provides a computer storage medium carrying one or more computer programs, which, when executed by an electronic device, enable the electronic device to implement the mobile terminal automated testing method described in the first aspect or any implementation thereof.

[0050] By employing the above technical solution, the mobile automated testing method provided in this application obtains test requirement information, retrieves test knowledge related to the test requirement information from a pre-built knowledge base, and uses the test knowledge to constrain the large language model to generate scripts using a project-predefined unified page encapsulation method and standardized underlying operation interface. Based on the test requirement information and test knowledge, the large language model generates automated test scripts that can be executed in the mobile automated driving layer. Therefore, this application introduces relevant constraints of test knowledge into the generation process of automated test scripts, guiding the large language model to use a project-predefined unified page encapsulation method and standardized underlying operation interface to generate automated test scripts. This ensures that the scripts conform to the project architecture specifications and can be stably executed directly in the mobile automated driving layer, significantly improving the standardization, universality, and maintainability of script generation.

[0051] Furthermore, the automated test script is run to execute tests. If the test fails, a patch script corresponding to the automated test script is generated based on the failure evidence package collected during the script's execution, which includes the failure step trajectory. This patch script is then used as a temporary script to re-execute the test until the test termination condition is met. If the test passes when the termination condition is met, the patch script that passed the test is compared with the automated test script, and the comparison result is submitted for manual review. This application can automatically generate a patch script based on the failure evidence package collected during the script's execution when the automated test script fails, eliminating the need for manual intervention in script generation and patch debugging. This effectively shortens the development and repair cycle of the automated test script and is more suitable for the project requirements of rapid updates and iterations in mobile applications. At the same time, to ensure the security of engineering testing and avoid the script repair risks caused by the illusion of a large language model, the patch script is used as a temporary script for iterative testing until the test termination condition is met, ensuring that the patch script is executable. The manual review and comparison results further verify the rationality and adaptability of the patch script, avoiding the problem of incorrect patches overwriting the original script and causing the automated test script to become increasingly messy, thus improving the stability of automated testing.

[0052] Finally, if the review is approved, the patch script that passed the test will be updated to the knowledge base. This allows the effective experience gained from each script repair to be transformed into reusable test knowledge. When generating or repairing scripts in the future, relevant knowledge can be retrieved directly from the knowledge base, reducing the recurrence of similar test failures, lowering test maintenance costs, and further improving the executability of automated test scripts. Attached Figure Description

[0053] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0054] Figure 1 A schematic diagram of a system architecture provided for this application;

[0055] Figure 2 A flowchart illustrating a mobile terminal automated testing method provided in this application;

[0056] Figure 3 This is a schematic diagram of the structure of a mobile terminal automated testing device provided in this application;

[0057] Figure 4 This is a schematic diagram of the structure of an electronic device provided in this application. Detailed Implementation

[0058] The embodiments of this application are described below with reference to the accompanying drawings. The terminology used in the implementation section of this application is for explaining specific embodiments only and is not intended to limit the scope of this application.

[0059] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.

[0060] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0061] Optionally, the mobile automated testing method provided in this application can be applied to, for example... Figure 1 The system architecture shown includes a terminal 100 and a server 200. The server 200 may include one or more servers (…). Figure 1 (This example uses a server as an illustration).

[0062] Either terminal 100 or server 200 can be used independently to execute the mobile automated testing method provided in the embodiments of this application. Alternatively, terminal 100 and server 200 can also be used collaboratively to execute the mobile automated testing method provided in the embodiments of this application.

[0063] The following description Figure 1 The product form of the mid-terminal 100;

[0064] The terminal 100 in this application embodiment can be a mobile phone, tablet computer, wearable device, vehicle device, augmented reality (AR) / virtual reality (VR) device, laptop computer, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), etc., and this application embodiment does not impose any restrictions on it.

[0065] To enable those skilled in the art to better understand this application, the mobile terminal automated testing method of this application embodiments will be described in detail below with reference to the accompanying drawings.

[0066] Reference Figure 2 , Figure 2 This application provides a flowchart illustrating a mobile terminal automated testing method as shown in the embodiments. Figure 2 As shown, this mobile automated testing method may include:

[0067] Step S201: Obtain test requirement information and retrieve test knowledge related to the test requirement information from the pre-built knowledge base.

[0068] Here, test requirement information refers to requirement information related to the functional testing of the application under test. In this embodiment, test requirement information may include scenario step description information and assertion requirements.

[0069] The scenario step description information refers to the information describing a series of interactive actions that need to be performed on the application under test, in natural language or semi-structured language, according to the business sequence of the user's actual operation; the assertion requires describing the expected results that should be achieved when performing this series of interactive actions, which can be used to determine whether the script has passed the test.

[0070] Considering the generation illusion of large language models, in order to avoid the influence of the illusion, this embodiment can retrieve test knowledge related to test requirements from a pre-built knowledge base. Here, the test knowledge is at least used to constrain the large language model to use the project's predefined unified page encapsulation method and standardized underlying operation interface to generate scripts.

[0071] In other words, test knowledge should at least include action interface descriptions and page object invocation rules.

[0072] The action interface description describes the name, call prototype, parameter constraints, and usage examples of the standardized underlying operation interface that needs to be called to execute the action. Based on this, the large language model can be constrained to generate scripts according to the standardized underlying operation interface.

[0073] The purpose of page object invocation rules is to standardize how scripts reference and locate page elements. It encapsulates specific page elements (such as login buttons and search input boxes) and business processes (such as login and sharing) into easily maintainable object paths. In subsequent script generation, the large language model can generate standard code that conforms to the project architecture based on the constraints generated by these page object invocation rules. For example, for "login," the path rule (e.g., LoginPage.gotlogin) is retrieved from the knowledge base's Page Object (PO) library. During script generation, the login code is generated based on this path rule, instead of creating a custom method like page.click_login or redundantly generating a series of basic method implementations for login.

[0074] Optionally, the test knowledge can also include assertion specifications, historical test cases, historical page screenshots, etc., to help improve the accuracy of the subsequently generated automated test scripts.

[0075] Step S202: Based on the test requirements and test knowledge, generate automated test scripts that can be executed in the mobile terminal automation driver layer through the large language model, and run the automated test scripts to perform tests.

[0076] Here, the Unified Driver Adaptation Layer (UDAL) is a low-level driver service used to interact with mobile device systems and implement automated operations such as clicking, swiping, text input, and element retrieval. It is used to convert operation instructions in automated test scripts into actual operations that can be executed by the current test device.

[0077] In this embodiment, by introducing action interface descriptions and page object call rules into the test knowledge to constrain the process of generating scripts from the large language model, the automated test scripts generated by the large language model can be executed in the mobile automation driver layer. Since one of the core functions of the mobile automation driver layer is to smooth out the underlying differences between different devices, allowing the same script to run normally on multiple devices, this application generates automated test scripts that can be executed in the mobile automation driver layer. This gives the automated test scripts cross-device adaptability, enabling them to adapt to various different test devices, such as Android and HarmonyOS devices.

[0078] It should be understood that automated test scripts are generated based on test requirements information and test knowledge. Test requirements information includes scenario step descriptions as described above. Therefore, automated test scripts contain specific instructions that can drive a series of interactive actions. When the automated test script is run, it is equivalent to generating the series of interactive actions according to the specific instructions in the script, such as the action of "clicking the login button". The actual action result generated by the interactive action can determine whether the test passes or fails.

[0079] In one optional embodiment, the process of "generating automated test scripts that can be executed in the mobile automation driver layer by means of a large language model based on test requirement information and test knowledge" may include: obtaining a pre-built script generation prompt instruction template, which includes at least requirement slots and external knowledge slots. The script generation prompt instruction template is used to instruct the large language model to generate automated test scripts that conform to the requirement content in the requirement slots, using the knowledge content in the external knowledge slots as constraints; filling the requirement slots with test requirement information; filling the external knowledge slots with test knowledge; obtaining script generation prompt instructions; and inputting the script generation prompt instructions into the large language model to obtain the automated test script.

[0080] As mentioned earlier, when test knowledge includes action interface descriptions, page object invocation rules, assertion specifications, historical test cases, and historical page screenshots, the external knowledge slots include: action slots for filling action interface descriptions, page object invocation slots for filling page object invocation rules, assertion specification slots for filling assertion specifications, test case slots for filling historical test cases, and page screenshot slots for filling historical page screenshots.

[0081] Of course, external knowledge slots can be other types as well, without specific limitations here.

[0082] For example, optionally, the script generation prompt instruction template may include the following core content: only methods exposed by the page object call rules filled in the page object call slot can be called; the assertion specifications filled in the assertion specification fill slot can only come from the assertion library (which is part of the knowledge base) or be generated according to the assertion library format; each step of the script execution must record evidence points (used to form the failure evidence package below); the script must contain pre- and post-hooks such as "startup, login, cleanup".

[0083] This embodiment ensures that the generated automated test scripts conform to business logic and strictly adhere to technical code standards by configuring scripts to generate prompt instruction templates and filling in relevant content, thus avoiding the problem of invalid instructions generated by large language models.

[0084] Step S203: If the test fails, a patch script corresponding to the automated test script is generated based on the failure evidence package containing the failure step trajectory collected during the script execution. The patch script is then used as a temporary script to re-execute the test until the test termination condition is met.

[0085] As mentioned earlier, test requirements can include assertion requirements, and test knowledge can include assertion specifications. Based on this, we can check whether the actual action result is consistent with the expected action result. If they are consistent, the test passes; otherwise, the test fails.

[0086] In this embodiment, during script execution, process data such as the execution step trajectory can be collected in real time, including but not limited to: screenshots, logs, video recordings, device status, error stacks, etc.

[0087] Therefore, when a test fails, a failure evidence package containing the failure steps can be obtained. After obtaining the failure evidence package, this embodiment can generate a patch script corresponding to the automated test script based on the failure evidence package to fix errors in the automated test script or temporary script.

[0088] In one possible implementation, attribution analysis can be performed based on the failure evidence package to obtain the failure type and generate corresponding repair suggestions. Then, based on the failure evidence package, failure type, repair suggestions, and other information, a patch script can be generated.

[0089] Optionally, the aforementioned failure types can include: recoverable failure types (such as test failures caused by environmental issues like device offline, driver unavailability, network unreachability, animation obscuring UI elements, and short loading times) and unrecoverable failure types (such as incompatibility between the automation driver version and the script version, syntax errors in script test cases, dependency errors, etc.). Furthermore, in the recoverable failure scenarios, the cause of the test failure is not a script issue; in this case, a patch script need not be generated, and the test can be re-executed directly. In the unrecoverable failure scenarios, the cause of the test failure is a script issue; therefore, a patch script can be generated based on the failure evidence package, and the test can be re-executed based on the patch script.

[0090] To avoid the error patch script directly overwriting the original automated test script, which would lead to the script becoming increasingly messy after modification, this embodiment preferably stores the patch script as a temporary script, and subsequent script execution and patch repair processes are performed based on the temporary script.

[0091] The conditions for ending the test can include: passing the test, reaching the required number of retests, etc., and the specific conditions can be determined based on the actual scenario.

[0092] Step S204: If the test passes when the test end conditions are met, the patch script that passes the test is compared with the automated test script, and the comparison result is submitted for manual review.

[0093] In this embodiment, if the test ends due to the patch script passing the test, the patch script (i.e., the temporary script) that passed the test can be compared with the automated test script generated in step S202 to obtain the comparison results, which are then submitted for manual review. This allows the reviewer to see the differences between the patch script and the automated test script more intuitively and clearly based on the comparison results, and to review whether the requirements of the project are met.

[0094] Step S205: If the review is approved, update the patch script that has passed the test to the knowledge base.

[0095] Optionally, in addition to updating the patch scripts that have passed the test to the knowledge base, you can also summarize the business logic that has passed the test and store the business logic, scenario steps description, assertion requirements, and other related content in the knowledge base for subsequent knowledge retrieval.

[0096] It is evident that this application forms a closed loop of "knowledge constraint generation - automatic failure repair - knowledge accumulation through review", which continuously improves the intelligence level of automated testing.

[0097] The mobile automated testing method provided in this application obtains test requirement information, retrieves test knowledge related to the test requirement information from a pre-built knowledge base, and uses this test knowledge to constrain the large language model to generate scripts using a project-predefined unified page encapsulation method and standardized underlying operation interfaces. Based on the test requirement information and test knowledge, the large language model generates automated test scripts that can be executed in the mobile automation driver layer. Therefore, this application introduces relevant constraints of test knowledge into the generation process of automated test scripts, guiding the large language model to use a project-predefined unified page encapsulation method and standardized underlying operation interfaces to generate automated test scripts. This ensures that the scripts conform to the project architecture specifications and can be stably executed directly in the mobile automation driver layer, significantly improving the standardization, universality, and maintainability of script generation.

[0098] Furthermore, the automated test script is run to execute tests. If the test fails, a patch script corresponding to the automated test script is generated based on the failure evidence package collected during the script's execution, which includes the failure step trajectory. This patch script is then used as a temporary script to re-execute the test until the test termination condition is met. If the test passes when the termination condition is met, the patch script that passed the test is compared with the automated test script, and the comparison result is submitted for manual review. This application can automatically generate a patch script based on the failure evidence package collected during the script's execution when the automated test script fails, eliminating the need for manual intervention in script generation and patch debugging. This effectively shortens the development and repair cycle of the automated test script and is more suitable for the project requirements of rapid updates and iterations in mobile applications. At the same time, to ensure the security of engineering testing and avoid the script repair risks caused by the illusion of a large language model, the patch script is used as a temporary script for iterative testing until the test termination condition is met, ensuring that the patch script is executable. The manual review and comparison results further verify the rationality and adaptability of the patch script, avoiding the problem of incorrect patches overwriting the original script and causing the automated test script to become increasingly messy, thus improving the stability of automated testing.

[0099] Finally, if the review is approved, the patch script that passed the test will be updated to the knowledge base. This allows the effective experience gained from each script repair to be transformed into reusable test knowledge. When generating or repairing scripts in the future, relevant knowledge can be retrieved directly from the knowledge base, reducing the recurrence of similar test failures, lowering test maintenance costs, and further improving the executability of automated test scripts.

[0100] In one possible implementation, the process of "running automated test scripts to execute tests" described above will be introduced.

[0101] Understandably, the process of automated testing is essentially based on executing a series of interactive actions according to script instructions. The successful execution of these actions depends on accurately locating the corresponding interface elements. Therefore, this embodiment can run an automated test script, take screenshots of the interface during script execution, perform multimodal recognition on the target element to be located within the screenshot, obtain the target pixel coordinates, and convert the target pixel coordinates into the screen coordinates of the current test device based on preset scaling factors, device resolution, and device pixel density. Then, under preset waiting conditions, the target action corresponding to the target element is executed according to the screen coordinates.

[0102] More specifically, this embodiment can start and manage mobile automation services based on the current device type of the test equipment, and run automated test scripts based on the started services.

[0103] To accurately locate the target element corresponding to the target action, this embodiment provides a multimodal recognition and localization method. First, the semantic label of the target element is obtained, and the localization strategy corresponding to the target element is determined based on the semantic label. The localization strategy corresponding to the target element includes at least one of a target detection algorithm and an Optical Character Recognition (OCR) algorithm. The semantic label of the target element can be obtained from the test requirement information.

[0104] For example, if the semantic label of a target element indicates that the target element is text, then the OCR algorithm is used to identify and locate the target element; if the semantic label of a target element indicates that the target element is an icon, then the target detection algorithm is used to identify and locate the target element; if the semantic label of a target element indicates that the target element is a mixture of text and icon, or an unknown type, then the OCR algorithm and the target detection algorithm are used together to identify and locate the target element.

[0105] Of course, other positioning strategies can also be used, and this application does not impose specific limitations.

[0106] Next, following the positioning strategy established earlier, the target element is identified and located in the screenshot interface to obtain the identification and positioning results.

[0107] If the identification and positioning results include multiple positioning boxes and their corresponding position coordinates, then the target algorithm is used to deduplicate and merge the multiple positioning boxes to obtain at least one remaining positioning box.

[0108] If the localization box is a detection box obtained by the object detection algorithm, then the Non-Maximum Suppression (NMS) algorithm is used as the target algorithm. Combined with a preset intersection-union ratio (IU) threshold such as 0.5, multiple detection boxes are deduplicated to filter out duplicate detection boxes and obtain at least one remaining detection box.

[0109] If the positioning box is a text box obtained by the OCR algorithm, in order to avoid the OCR algorithm splitting the target element into multiple text boxes, a preset box merging rule can be used as the target algorithm to merge multiple text boxes and obtain at least one remaining text box.

[0110] Optionally, the preset box merging rule can be: if the vertical spacing between two adjacent text boxes is less than or equal to the preset box merging threshold, then the two adjacent text boxes are merged into one text box.

[0111] As mentioned earlier, there may be situations where both OCR and object detection algorithms are involved in localization. In this case, the NMS algorithm can be used to deduplicate the detection boxes obtained by the object detection algorithm, while the box merging rule can be used to merge the text boxes obtained by the OCR algorithm.

[0112] After the above deduplication and box merging processes, at least one positioning box can be obtained. In this embodiment, a target positioning box can be further selected from the at least one positioning box, and the target pixel coordinates can be obtained according to the position coordinates corresponding to the target positioning box. For example, one possible implementation is: if the position coordinates corresponding to the target positioning box include the coordinates of the four corner points of the target positioning box, then the coordinates of the center point of the target positioning box are calculated according to the coordinates of the four corner points and used as the target pixel coordinates; or, the coordinates of the four corner points are used as the target pixel coordinates.

[0113] In this embodiment, the process of "filtering target positioning boxes from the at least one positioning box" can be determined based on the semantic features of the elements determined by the positioning box.

[0114] One possible implementation is as follows: each element defined by at least one positioning box is taken as a candidate element to obtain at least one candidate element; a scoring feature vector is constructed for each candidate element, and a comprehensive score is determined for each candidate element based on the scoring feature vector. The comprehensive score represents the comprehensive similarity between the candidate element and the target element. The candidate element with the largest comprehensive score is selected from at least one candidate element. If the comprehensive score of the selected candidate element is greater than a preset score threshold (e.g., 0.75), the positioning box corresponding to the selected candidate element is taken as the target positioning box.

[0115] The scoring feature vector includes at least one of the following dimensions: text similarity between candidate and target elements (e.g., edit distance, word vector similarity), category consistency (e.g., icon type or text type), relative distance error, size ratio error, and visual similarity. Of course, the scoring feature vector may include other dimensions, which are not specifically limited here.

[0116] In order to calculate the comprehensive score from the scoring feature vector, this embodiment can optionally pre-generate the weights corresponding to each dimension. For example, if the text similarity result is the most reliable, then the weight corresponding to text similarity is the highest; if the visual similarity result is the least reliable, then the weight corresponding to visual similarity is the lowest. Then, the scoring feature vector is weighted and summed according to the weights corresponding to each dimension to obtain the comprehensive score.

[0117] Of course, there are other methods for calculating the overall score, which will not be specifically limited here.

[0118] After obtaining the target pixel coordinates, this embodiment can convert the target pixel coordinates into the screen coordinates of the current test device based on a preset scaling factor, device resolution, and device pixel density (such as dots per inch (DPI)), and then execute the target action corresponding to the target element according to the converted screen coordinates.

[0119] Considering that the process of converting target pixel coordinates to screen coordinates takes a very short time, there may be a situation where the target element has not yet finished rendering after the screen coordinates are calculated. If the target action is performed according to the screen coordinates at this time, the expected result may not be obtained because the target element has not finished rendering.

[0120] Therefore, in a preferred implementation, the process of "executing the target action corresponding to the target element according to the converted screen coordinates" may include: under the condition of satisfying a preset waiting condition, executing the target action corresponding to the target element according to the screen coordinates.

[0121] Optionally, the waiting conditions may include at least one of explicit waiting, implicit waiting, and page stability conditions. Explicit waiting refers to waiting for page elements to appear on the real-time interface (e.g., detection through visual detection algorithms or control tree detection algorithms, such as OCR recognition algorithms and object detection algorithms). Implicit waiting refers to setting a minimum waiting interval for the target action. The page stability condition is that if the structural similarity between two consecutive screenshots is greater than a preset structural threshold, then the page is determined to be stable. Specifically, after the previous action of the target action is executed, a screenshot is taken, and another screenshot is taken after a set interval. If the structural similarity between the two screenshots is greater than the structural threshold, it means that the page has been loaded, that is, the target element in the page has been rendered. At this time, the page is determined to be stable, and the target action can be executed.

[0122] Understandably, after waiting for a period of time according to the waiting conditions, the target action is executed according to the screen coordinates. Theoretically, if the screen coordinates are accurate, the corresponding action result should be generated. However, due to network fluctuations or other reasons (i.e., the recoverable failure type mentioned earlier), there may be no response when executing the target action; for example, clicking the login button may not lead to the login success interface. To minimize execution failures caused by non-script reasons, this embodiment can set the action-level retry count, such as 2 times. That is, after executing each action, if no action result is generated, the target action is executed again according to the screen coordinates until the action-level retry count is reached.

[0123] Preferably, in addition to the action-level retries mentioned above, this embodiment can also set a business step-level retries, such as 1 time. That is, for each business step, each action under that business step can be retried according to the action-level retries. At the same time, if the entire business step fails, all actions under that business step can be restarted from the beginning. For example, the login step includes three actions: entering account, entering password, and clicking login. Any one of these actions can be retried twice. And if any action fails after two retried attempts, it can be restarted once according to the process of "enter account - enter password - click login".

[0124] In this embodiment, by setting a multimodal recognition and localization strategy, it can be ensured that various types of target elements can be accurately located to their corresponding target pixel coordinates. This allows for the execution of target actions based on accurate target pixel coordinates, reducing the probability of test failures due to non-script-related reasons. Furthermore, the conversion from target pixel coordinates to device physical coordinates (i.e., screen coordinates) solves the problem that large language models cannot directly handle hardware physical differences.

[0125] Meanwhile, by setting waiting conditions and the number of retries at the action level and the number of retries at the business step level, it is ensured that when the screen coordinates are accurate, the execution of the target action can produce effective action results, further reducing the probability of test failure caused by non-script reasons.

[0126] It should be understood that the process of “identifying and locating the target element in the screenshot interface according to the positioning strategy and obtaining the identification and positioning result” mentioned above can adopt a global positioning method based on the entire screenshot interface, or a local positioning method can be used.

[0127] Optionally, the local positioning method can be a region of interest (ROI) positioning method. That is, in this embodiment, positioning assistance information can be obtained. This positioning assistance information is used to assist in locating the target element. Based on the positioning assistance information, the ROI containing the target element is determined from the screenshot interface, and the target element is identified and located within the ROI to obtain the identification and positioning result.

[0128] Optionally, the positioning assistance information may be page area division information determined based on test requirements information, the position information of the target element such as the title bar, content area, etc., or the relative position information of the element based on the target action and the previous action, etc., and this application does not impose specific limitations.

[0129] By locally locating the region of interest and then finely locating the target element, we can obtain the target pixel coordinates more quickly and accurately, and improve the utilization rate of related resources.

[0130] It should also be understood that there may be situations where the identification and positioning results do not include the positioning box, the confidence level of the target positioning box is lower than the confidence level threshold, or the target pixel coordinates cause no action result to be generated after the target action is executed (in the case of action-level retries and business step-level retries as mentioned above, here it specifically refers to the last retries not generating an action result). These situations may all be due to the multimodal identification and positioning process failing to accurately locate the target element.

[0131] To further improve the positioning accuracy of target elements, this embodiment can also preset self-healing backtracking conditions, such as the positioning result not including a positioning box, the confidence level of the target positioning box being lower than the confidence level threshold, or the target pixel coordinates causing no action result after the target action is executed. When the preset self-healing backtracking conditions are met, the self-healing backtracking process is triggered. The self-healing backtracking process can be: using target remediation measures to re-identify and locate the target element. Optionally, the target remediation measures include at least one of the following: changing the positioning strategy, enabling a control tree positioning strategy, or enabling an anchor point positioning strategy. The anchor point positioning strategy refers to first locating stable anchor point elements that have a known positional relationship with the target element, and then locating the target element based on the stable anchor point elements.

[0132] For example, the change of positioning strategy can be as follows: if the positioning strategy determined above is an OCR recognition algorithm, then change it to an object detection algorithm in the self-healing rollback process; if the positioning strategy determined above is an object detection algorithm, then change it to an OCR recognition algorithm in the self-healing rollback process; if the positioning strategy determined above includes both OCR recognition algorithm and object detection algorithm, then the method of changing the positioning strategy will fail.

[0133] The control tree positioning strategy can be implemented as follows: if a control tree is available, all elements in the control tree are used as candidate elements. The candidate elements are then filtered according to control type, operability, etc., to obtain candidate elements that are closer to the characteristics of the target element. These candidate elements are then used as the located target elements, and the target pixel coordinates are obtained from their corresponding position information.

[0134] Enabling an anchor point positioning strategy can be done by first using OCR recognition algorithms, object detection algorithms, etc., to locate stable anchor point elements in the screenshot interface that have a known positional relationship with the target element. Here, stable anchor point elements refer to elements that can be reliably recognized by OCR recognition algorithms, object detection algorithms, etc., such as title elements, back button elements, etc.; then, based on the position of the located stable anchor point elements and the known positional relationship, locate the target element and obtain the target pixel coordinates.

[0135] It should be noted that the control tree positioning strategy and the anchor point positioning strategy may locate multiple candidate elements. Here, we can use the method of including the scoring feature vector provided above to further filter out a candidate element as the target element to be located.

[0136] It should also be noted that the self-healing rollback process may fail to locate the failure. Therefore, optionally, the number of self-healing attempts can be preset. If the result of the self-healing rollback process does not meet the self-healing rollback conditions, the self-healing rollback process will be triggered again until the number of self-healing attempts is reached.

[0137] Alternatively, this embodiment can also record the mapping relationship from "expected location description" to "alternative location description" and the valid conditions related to the mapping relationship (used to record the applicable conditions of the mapping relationship, such as the applied device, resolution, script version, etc., to avoid incorrectly applying the self-healing experience of one device to another device), and update it to the element mapping table or the self-healing rule base (the self-healing rule base is a higher-level logical strategy library relative to the element mapping table, which not only records the above mapping relationship and valid conditions, but also records more general self-healing methods summarized based on the mapping relationship and valid conditions), providing a reference for the subsequent location process.

[0138] For example, if the desired location description of a button is "text=OK", and after self-healing it is found that the button is "icon=check icon", then a mapping from "text=OK" to "icon=check icon" can be generated. The valid condition can be, for example, "only applicable to Android version 5.0 and above".

[0139] In this embodiment, by setting self-healing fallback conditions, abnormal scenarios caused by inaccurate multimodal recognition positioning can be accurately identified, significantly improving the positioning accuracy and success rate of target elements, avoiding test interruption due to multimodal recognition positioning failure, and improving the stability and robustness of automated testing under complex interfaces.

[0140] In some other embodiments of this application, considering that both the automated test script and the patch script mentioned above are scripts generated by a large language model, there may be errors in syntax, dependency relationships, etc., which may affect the script test results. Therefore, in one possible implementation, this embodiment can use the automated test script and the patch script as target scripts respectively. Before the target script is run, at least one of the following checks is performed on the target script: syntax check, static check, dependency check, and dry run check.

[0141] Among them, syntax checking is used to verify whether the script conforms to the syntax rules of the programming language; static checking is used to analyze the abstract syntax tree and code structure without running the script to check for potential problems in the script; dependency checking is used to check whether the external resources required by the script are complete and whether the versions are compatible; and dry run verification is used to simulate the running of the script without actually performing any operations to verify the logical correctness of the script and the feasibility of the execution path.

[0142] If the target script fails to be validated, the target script is repaired according to the validation result, and the validation is performed again as described above until the validation count threshold is reached, such as 5 times; if the target script passes the validation, the target script is run according to the process described above to perform the specific test.

[0143] In this embodiment, the verification process before script execution reduces the proportion of scripts that are generated but cannot be run, thereby improving testing efficiency.

[0144] The above describes the complete process of automating testing of the application under test on the current test device. It is understandable that in some scenarios, the purpose of automated testing is to test the functional operation of the application under test on a specific test device. In this case, the test device cannot be changed. In other scenarios, the purpose of automated testing is not limited to a specific test device. In this case, if the current test device is unavailable, the test device can be changed for retesting. Optionally, in scenarios where the test device can be changed for retesting, this embodiment can perform attribution analysis based on the failure evidence package mentioned above to obtain the failure type and update the failure count corresponding to that failure type, so as to obtain the failure count corresponding to each failure type on the current test device. It is worth noting that the failure count here is not limited to the testing of the current application under test, but can cover the testing of all applications under test.

[0145] Furthermore, if the number of failures corresponding to the target failure type (which can be preset or cover all failure types) reaches the preset failure threshold, the current test device will be replaced.

[0146] As mentioned above, this application can also be applied to scenarios where the test equipment cannot be replaced. In order to successfully complete the test in such scenarios, optionally, this embodiment can trigger a degradation process when the number of failures corresponding to the target failure type reaches a preset failure threshold. That is, in the subsequent iterations after triggering the degradation process, the target element is identified and located according to the degradation strategy.

[0147] Optionally, the degradation process includes: downgrading from full-map localization to region of interest localization during the multimodal recognition and localization process, and / or directly replacing the multimodal recognition and localization process with an anchor point localization strategy.

[0148] Of course, the above-mentioned downgrade process can also be applied to scenarios where test equipment can be replaced for retesting, and no specific limitations are made here.

[0149] This embodiment employs a task degradation mechanism to balance resource consumption and element location accuracy. For example, when a highly complex task fails, it automatically switches to a lightweight and more stable execution method, preventing test process interruptions due to a single failure. Simultaneously, the task degradation mechanism effectively improves the system's robustness in complex interfaces and weakly adaptable environments, ensuring stable execution of test tasks across various devices and scenarios, thereby enhancing the overall efficiency and continuity of automated testing.

[0150] Optionally, to ensure the stability and repeatability of automated testing, this application may also set up a data factory and environment isolation mechanism, that is, configure test account, environment identifier and other information for each automated testing process, so as to generate reusable test results and build an isolated running environment based on this information, and execute each test process in a defined initial state through independent account allocation, cache clearing, data reset and other methods.

[0151] Simultaneously, sensitive information such as accounts, tokens, and internal domains in logs and test reports generated during the testing process can be anonymized to prevent information leakage. By automatically performing data cleanup and state restoration before and after testing, data interference between different test cases can be effectively avoided, ensuring the consistency of test results; and by dividing the environment, test operations are made independent of the online real-world operating environment, preventing test behavior from affecting real user business.

[0152] The above describes a mobile terminal automated testing method provided by the embodiments of this application. The following describes the apparatus for performing the above-described mobile terminal automated testing method.

[0153] Please see Figure 3 , Figure 3 This is a schematic diagram of the structure of a mobile terminal automated testing device provided in an embodiment of this application. Figure 3 As shown, the mobile automated testing device may include:

[0154] The data acquisition unit 301 is used to acquire test requirement information and retrieve test knowledge related to the test requirement information from a pre-built knowledge base. The test knowledge is used to constrain the large language model to generate scripts using the project's predefined unified page encapsulation method and standardized underlying operation interface.

[0155] The script testing unit 302 is used to generate automated test scripts that can be executed in the mobile terminal automation driver layer based on test requirement information and test knowledge through a large language model, and to run the automated test scripts to perform tests.

[0156] The script patching unit 303 is used to generate a patch script corresponding to the automated test script based on the failure evidence package containing the failure step trajectory collected during the script execution if the test fails, and to re-execute the test as a temporary script until the test end condition is met.

[0157] The script comparison unit 304 is used to compare the patch script that passed the test with the automated test script if the test passes when the test end condition is met, and submit the comparison result for manual review.

[0158] Knowledge base update unit 305 is used to update the tested patch script to the knowledge base if the review is approved.

[0159] Each module in the aforementioned mobile automated testing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.

[0160] This application also provides an electronic device, which may include at least one processor and a memory connected to the processor, wherein:

[0161] Memory is used to store computer programs;

[0162] The processor is used to execute computer programs to enable the electronic device to implement any of the mobile terminal automated testing methods provided in the embodiments of this application.

[0163] refer to Figure 4 The diagram illustrates a structural schematic suitable for implementing the electronic device in the embodiments of this application. The electronic device in the embodiments of this application may include, but is not limited to, fixed terminals such as mobile phones, laptops, PDAs (personal digital assistants), PADs (tablet computers), desktop computers, etc. Figure 4 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0164] like Figure 4As shown, the electronic device may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 601, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 602 or a program loaded from a storage device 608 into a random access memory (RAM) 603. When the electronic device is powered on, the RAM 603 also stores various programs and data required for the operation of the electronic device. The processing unit 601, ROM 602, and RAM 603 are interconnected via a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

[0165] Typically, the following devices can be connected to I / O interface 605: input devices 606 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 607 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 608 including, for example, memory cards, hard drives, etc.; and communication devices 609. Communication device 609 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 4 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.

[0166] This application also provides a computer program product including computer-readable instructions, which, when executed on an electronic device, cause the electronic device to implement any of the mobile terminal automated testing methods provided in this application.

[0167] This application also provides a computer-readable storage medium that carries one or more computer programs. When the one or more computer programs are executed by an electronic device, the electronic device can implement any of the mobile terminal automated testing methods provided in this application.

[0168] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0169] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0170] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0171] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. A mobile terminal automated testing method, characterized in that, include: Obtain test requirement information, retrieve test knowledge related to the test requirement information from a pre-built knowledge base, and the test knowledge is used at least to constrain the large language model to generate scripts using the project's predefined unified page encapsulation method and standardized underlying operation interface; Based on the test requirements information and the test knowledge, an automated test script that can be executed in the mobile terminal automation driver layer is generated through the large language model, and the automated test script is run to perform the test; If the test fails, a patch script corresponding to the automated test script is generated based on the failure evidence package containing the failure step trajectory collected during the script execution. The patch script is then used as a temporary script to re-execute the test until the test termination condition is met. If the test passes when the test termination condition is met, the patch script that passes the test is compared with the automated test script, and the comparison result is submitted for manual review. If the review is approved, the patch script that passed the test will be updated to the knowledge base.

2. The mobile terminal automated testing method according to claim 1, characterized in that, Running the automated test script to execute the test includes: The automated test script is run, and a screenshot of the interface during the script's execution is taken. In the screenshot, the target element to be located is located using multimodal recognition to obtain the target pixel coordinates. Based on a preset scaling factor, device resolution, and device pixel density, the target pixel coordinates are converted into the screen coordinates of the current test device. Under preset waiting conditions, the target action corresponding to the target element is executed according to the screen coordinates.

3. The mobile terminal automated testing method according to claim 2, characterized in that, The process of performing multimodal recognition and localization on the target element to be located in the screenshot interface to obtain the target pixel coordinates includes: Obtain the semantic tag of the target element, and determine the localization strategy corresponding to the target element based on the semantic tag. The localization strategy includes at least one of the target detection algorithm and the optical character recognition algorithm. The target element is identified and located in the screenshot interface according to the positioning strategy described above, and the identification and positioning results are obtained. If the recognition and positioning result includes multiple positioning boxes and the position coordinates corresponding to the multiple positioning boxes, then the target algorithm is used to perform deduplication and box merging processing on the multiple positioning boxes to obtain at least one remaining positioning box. If the positioning box is a detection box obtained by the target detection algorithm, then the target algorithm is a non-maximum suppression algorithm. If the positioning box is a text box obtained by the optical character recognition algorithm, then the target algorithm is a preset box merging rule. Target positioning boxes are selected from the at least one positioning box, and the target pixel coordinates are obtained based on the position coordinates corresponding to the target positioning boxes.

4. The mobile terminal automated testing method according to claim 3, characterized in that, The step of identifying and locating the target element in the screenshot interface according to the positioning strategy, and obtaining the identification and positioning result, includes: Obtain positioning assistance information, which is used to assist in locating the target element; Based on the positioning assistance information, determine the region of interest containing the target element from the screenshot interface; The target element is identified and located within the region of interest to obtain the identification and location result.

5. The mobile terminal automated testing method according to claim 3 or 4, characterized in that, Also includes: If the preset self-healing rollback conditions are met, target remediation measures are adopted to re-identify and locate the target element. The target remediation measures include at least one of the following measures: changing the positioning strategy, enabling the control tree positioning strategy, and enabling the anchor point positioning strategy. The anchor point positioning strategy refers to first locating a stable anchor point element that has a known positional relationship with the target element, and then locating the target element based on the stable anchor point element. The self-healing rollback conditions include: the identification and positioning result does not include the positioning box, the confidence of the target positioning box is lower than the confidence threshold, and the target pixel coordinates cause no action result to be generated after the target action is executed.

6. The mobile terminal automated testing method according to claim 3, characterized in that, The step of filtering the target location box from the at least one location box includes: Each element defined by the at least one positioning box is taken as a candidate element, and a scoring feature vector corresponding to the candidate element is constructed. The scoring feature vector includes at least one of the following dimensions: text similarity, category consistency, relative distance error, size ratio error and visual similarity between the candidate element and the target element. Based on the scoring feature vector corresponding to the candidate element, the comprehensive score corresponding to the candidate element is determined, and the comprehensive score represents the comprehensive similarity between the candidate element and the target element. From at least one candidate element, select the candidate element with the highest comprehensive score. If the comprehensive score of the selected candidate element is greater than a preset score threshold, then the positioning box corresponding to the selected candidate element is taken as the target positioning box.

7. The mobile terminal automated testing method according to claim 1, characterized in that, Also includes: The automated test script and the patch script are respectively used as target scripts; Before the target script is run, at least one of the following checks is performed on the target script: syntax check, static check, dependency check, and dry run check. The syntax check is used to verify whether the script conforms to the syntax rules of the programming language. The static check is used to analyze the abstract syntax tree and code structure without running the script to check for pre-existing potential problems in the script. The dependency check is used to check whether the external resources required for the script to run are complete and whether the versions are compatible. The dry run check is used to simulate the running of the script without actually performing any operations to verify the logical correctness of the script and the feasibility of the execution path.

8. The mobile terminal automated testing method according to claim 1, characterized in that, The step of generating automated test scripts executable in the mobile automation driver layer based on the test requirement information and the test knowledge using the large language model includes: Obtain a pre-built script generation prompt instruction template, which includes at least a requirement slot and an external knowledge slot. The script generation prompt instruction template is used to instruct the large language model to generate an automated test script that conforms to the requirement content in the requirement slot, based on the knowledge content in the external knowledge slot. The test requirement information is filled into the requirement slot, and the test knowledge is filled into the external knowledge slot to obtain the script generation prompt instruction; The script generates prompts which are then input into the large language model to obtain the automated test script.

9. The mobile terminal automated testing method according to claim 2, characterized in that, Also includes: Based on the failure evidence package, attribution analysis is performed to obtain the failure type, and the failure count corresponding to the failure type is updated. To obtain the number of failures corresponding to each failure type on the current test device; If the number of failures corresponding to the target failure type reaches a preset failure threshold, the current test device is replaced and / or a degradation process is triggered. The degradation process includes: downgrading from full-map positioning to region of interest positioning during the multimodal recognition and positioning process, and / or directly replacing the multimodal recognition and positioning process with an anchor point positioning strategy.

10. A mobile terminal automated testing device, characterized in that, include: The data acquisition unit is used to acquire test requirement information and retrieve test knowledge related to the test requirement information from a pre-built knowledge base. The test knowledge is at least used to constrain the large language model to generate scripts using a project-predefined unified page encapsulation method and standardized underlying operation interface. The script testing unit is used to generate automated test scripts that can be executed in the mobile terminal automation driver layer based on the test requirement information and the test knowledge through the large language model, and to run the automated test scripts to perform tests. The script patching unit is used to generate a patch script corresponding to the automated test script based on the failure evidence package containing the failure step trajectory collected during the script execution if the test fails, and to re-execute the test as a temporary script until the test end condition is met. The script comparison unit is used to compare the patch script that passed the test with the automated test script if the test passes when the test end condition is met, and submit the comparison result for manual review. The knowledge base update unit is used to update the patch script that passed the test into the knowledge base if the review is approved.