Test image-based ui automation testing method, device, medium and product

By building a visual resource management platform and a multimodal positioning strategy, the problems of chaotic image resource management and recognition stability in UI automated testing have been solved, and an efficient and stable enterprise-level UI automated testing solution has been achieved.

CN122220221APending Publication Date: 2026-06-16TUS CLOUD CONTROL (BEIJING) TECH LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TUS CLOUD CONTROL (BEIJING) TECH LTD
Filing Date
2026-03-04
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In existing UI automation testing, image resource management is chaotic, resulting in strong coupling between data and code and high maintenance costs; single image recognition strategies have low success rates and poor stability in complex environments, making it difficult to meet the needs of large-scale enterprise-level testing.

Method used

A visual resource management platform is constructed, which stores benchmark images based on a module-level structure. It combines multimodal localization strategies such as image recognition, control tree, and character recognition, and adopts a local caching and remote server collaboration mechanism to dynamically adjust the image matching threshold, thereby achieving adaptive degradation and resource scheduling.

🎯Benefits of technology

It reduces the code size of test projects, improves the robustness and efficiency of automated testing, ensures the continuity and accuracy of the testing process, and reduces maintenance costs and manpower input.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122220221A_ABST
    Figure CN122220221A_ABST
Patent Text Reader

Abstract

The embodiment of the application relates to the information technology field, discloses a UI automatic test method, equipment, medium and product based on a test image; the method comprises the following steps: collecting screenshots of each test scene in a product to be tested as benchmark images; starting an automatic process, calling an image set management list through test code, and real-time intercepting a screen of a device to be tested; using image recognition technology to perform image processing and matching on the benchmark images and the real-time intercepted screen to locate a target element; when the target element is not successfully located through the image recognition technology, switching to an element locating mode based on a control tree or a locating mode based on character recognition to continue locating the target element, so as to obtain position coordinates of the target element in the screen; performing a preset test interactive operation on the device to be tested based on the position coordinates, and obtaining a feedback interface after the interaction; performing assertion judgment according to the feedback interface, and generating a test report after the automatic process is finished.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of information technology, and in particular to a UI automated testing method, device, medium, and product based on test images. Background Technology

[0002] In the field of software testing, with the rapid development of mobile internet and desktop applications, the iteration frequency of user interfaces (UIs) is accelerating, placing higher demands on the coverage and stability of UI automated testing. Traditional UI automated testing typically relies on control attributes (such as ID, XPath, Name, etc.) for element location. However, in some complex testing scenarios, such as custom controls, game interfaces, or dynamically loaded H5 pages, control attributes are often difficult to obtain or extremely unstable. To address this issue, image recognition-based element location technology is widely used in automated testing, determining the position and state of UI elements by comparing screenshots with expected benchmark images.

[0003] However, existing image recognition-based automated testing solutions have significant shortcomings in image resource management. Typically, the benchmark images required for testing are stored directly as files in the code directory of the automated test project. Developers create different folders to distinguish screenshots from different products or modules. This management approach leads to a high degree of coupling between image data and test scripts. As the number of test cases increases, the number of image files in the project grows dramatically, resulting in a large and difficult-to-maintain code repository. Furthermore, due to the lack of a unified visual management tool, testers cannot intuitively preview and retrieve image resources, and can only rely on file names for searching, which greatly reduces the efficiency of script writing and maintenance.

[0004] On the other hand, in actual image recognition execution, existing technical solutions often employ a single matching strategy, making them highly dependent on the operating environment. For example, when the screen resolution of the device under test changes, rendering colors differ, or network transmission causes image loading delays, a single image matching algorithm is prone to recognition failure or misidentification. Furthermore, existing solutions lack effective degradation mechanisms; once image recognition fails, the entire testing process is interrupted, resulting in poor robustness of automated testing and making it difficult to meet the needs of large-scale enterprise-level automated testing. Summary of the Invention

[0005] One objective of this application is to provide a UI automated testing method, device, medium, and product based on test images, which at least solves the technical problems in the prior art such as chaotic management of UI automated test image resources, high maintenance costs due to strong coupling between image data and code, and low success rate and poor stability of single image recognition strategies in complex and ever-changing test environments.

[0006] To achieve the above objectives, some embodiments of this application provide the following aspects:

[0007] In a first aspect, some embodiments of this application provide a UI automation testing method based on test images, the method comprising:

[0008] Screenshots of various test scenarios in the product under test are collected as reference images. A unique image name and image description are configured for the reference images in combination with the module hierarchy of the product under test, and an image set management list is generated on the visual resource management platform.

[0009] Initiate the automated testing process, call the image set management list through test code, and capture the screen of the device under test in real time;

[0010] Image recognition technology is used to process and match the reference image with real-time captured screen images in order to locate the target element;

[0011] When the target element is not successfully located using the image recognition technology, the system switches to the element location method based on the control tree or the location method based on character recognition to continue locating the target element, thereby obtaining the position coordinates of the target element on the screen.

[0012] Based on the location coordinates, perform a preset test interaction operation on the device under test and obtain the feedback interface after the interaction;

[0013] Assertions are made based on the feedback interface, and a test report is generated after the automated process is completed.

[0014] Secondly, some embodiments of this application also provide an electronic device, the electronic device comprising: one or more processors; and a memory storing computer program instructions, which, when executed, cause the processor to perform the steps of the method described above.

[0015] Thirdly, some embodiments of this application also provide a computer-readable medium having computer program instructions stored thereon, which can be executed by a processor to implement the method described above.

[0016] Fourthly, some embodiments of this application also provide a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the method described above.

[0017] Compared with related technologies, the solution provided in this application firstly achieves independent and unified management of image sets by constructing a visual resource management platform. This method extracts benchmark images from automated test scripts, stores them in a structured manner according to the product and module hierarchy, and configures unique names and descriptions. This design not only significantly reduces the code size of test projects and solves the coupling problem between data and logic, but also enables testers to intuitively preview, retrieve, and maintain test assets through a visual interface. When the UI changes, only the benchmark images need to be updated on the platform side without modifying the underlying code, thereby significantly reducing the maintenance threshold and labor costs of automated scripts.

[0018] Secondly, by introducing multimodal fusion localization and adaptive degradation strategies, the robustness and pass rate of automated testing are greatly improved. During test execution, the system prioritizes image recognition technology for localization, and combines it with screen resolution characteristics for adaptive scaling and image enhancement preprocessing, effectively overcoming interference caused by resolution and rendering differences between multiple devices. More importantly, when image recognition is hindered, the system can automatically switch to a localization method based on control trees or character recognition (OCR) as a fallback solution, ensuring the continuity of business processes. Coupled with a dynamic threshold adjustment mechanism based on historical confidence distribution, the system can set judgment criteria according to the historical performance of each UI element, effectively balancing precision and recall.

[0019] Finally, the system improved test execution efficiency and performance through an optimized resource scheduling mechanism. When retrieving benchmark images, the system employs a collaborative mechanism of local caching and remote server, combined with a time decay algorithm and dynamic failure strategy, ensuring rapid loading of frequently accessed images. Simultaneously, it addresses performance bottlenecks under multi-user concurrent requests through asynchronous concurrent processing. Furthermore, the assertion phase supports a flexible combination of full-image comparison and region element detection, enabling precise verification of UI layout and details. This results in a highly efficient, stable, and easily maintainable enterprise-level UI automation testing solution. Attached Figure Description

[0020] One or more embodiments are illustrated by way of example with reference numerals in the accompanying drawings. These illustrations do not constitute a limitation on the embodiments. Elements with the same reference numerals in the drawings are denoted as similar elements. Unless otherwise stated, the figures in the drawings are not to be limited by scale.

[0021] Figure 1 An exemplary flowchart of a UI automation testing method based on test images provided in some embodiments of this application;

[0022] Figure 2 An exemplary structural diagram of the electronic device provided for some embodiments of this application. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0024] Figure 1 An exemplary flowchart of a UI automation testing method based on test images, provided for some embodiments of this application, is shown below. The method includes:

[0025] S101. Collect screenshots of various test scenarios in the product under test as reference images, configure unique image names and descriptions for the reference images based on the module hierarchy of the product under test, and generate an image set management list on the visualization resource management platform.

[0026] Specifically, step S101 mainly involves the construction and management of test resources. In this embodiment, a cross-platform visual resource management platform was developed based on the PyQt framework, supporting management on both PC and mobile devices. Testers first create product dimensions based on the business logic of the product under test, and then subdivide functional modules under the product, thereby establishing a hierarchical structure of "product-module-image set". Data collectors take screenshots for each test scenario and store these screenshots as baseline images in the corresponding modules. During the data entry process, the system mandates the configuration of unique image names and detailed image descriptions, and automatically generates an image set management list containing product name, module name, test case description, creation time, modification time, and image thumbnails. This list serves as the index basis for subsequent automated calls, realizing the structured and visual management of test assets.

[0027] S102. Start the automated testing process, call the image set management list through the test code, and capture the screen of the device under test in real time.

[0028] Specifically, in step S102, the automated testing process begins. First, the execution environment (such as a test or development environment) is initialized, and the browser or mobile app is launched. At this point, the test code calls the image set management list based on preset configuration information (such as image names). During the acquisition of baseline image data, the system uses Redis as a local high-performance cache database to store image data in binary format. If the local cache misses, the system asynchronously and concurrently downloads the target image from the remote server using a pre-established persistent FTP connection pool and a load balancing strategy, thereby significantly reducing network overhead and improving concurrent processing capabilities.

[0029] S103. Using image recognition technology, the reference image is processed and matched with the real-time captured screen image to locate the target element.

[0030] Specifically, after acquiring the reference image and capturing a real-time screenshot of the current device screen, the system calls the OpenCV image processing library to simultaneously perform image enhancement processing on both the reference image and the real-time captured screen image. After processing, the system uses a template matching algorithm to perform fuzzy matching between the two. If the current similarity value exceeds a set threshold, the system locks the rectangular range of the best matching region in the screen coordinate system and calculates the coordinates of the geometric center point of the rectangular range using a mathematical formula to locate the target element.

[0031] S104. When the target element is not successfully located using the image recognition technology, switch to the element location method based on the control tree, or switch to the location method based on character recognition to continue locating the target element, thereby obtaining the position coordinates of the target element on the screen.

[0032] Specifically, during automated testing, if the confidence score calculated based on OpenCV image template matching in the aforementioned steps is lower than the preset judgment threshold, the system will not directly throw an exception causing test interruption. Instead, it will immediately activate a multi-level positioning degradation strategy. The system first attempts to switch to a UI Tree-based positioning mode, calling the Poco automation framework interface to deeply analyze the underlying UI rendering tree structure of the current interface under test (such as Android's AccessibilityNode Info or iOS's XCUI Element). By traversing the node hierarchy, it retrieves target control objects with specific attributes (such as Name, Text, Type), thereby bypassing visual rendering interference and directly obtaining the physical coordinates of the control. If effective control tree information cannot be obtained in certain special scenarios (such as game engine rendering or non-native H5 pages), the system will further downgrade to an OCR (Optical Character Recognition) positioning mode. It will use text recognition algorithms to extract text and analyze the layout of the current screenshot, searching for keyword areas that match the description of the target element. Once the target is successfully located through any of the above degradation methods, the system will calculate the geometric center coordinates of the target area as the final position coordinates for subsequent interactions.

[0033] S105. Perform a preset test interaction operation on the device under test based on the location coordinates, and obtain the feedback interface after the interaction.

[0034] Specifically, the system passes the obtained center point coordinates to the underlying automation-driven framework (such as Selenium or Appium). Based on the test case definition, it simulates real user actions, such as clicking or long-pressing at the coordinates, or entering text messages in the located input box. After the operation is completed, the device under test's interface changes direction or status, and the system immediately obtains the feedback interface after the interaction, preparing for result verification.

[0035] S106. Based on the feedback interface, make assertion judgments and generate a test report after the automated process is completed.

[0036] Specifically, the assertion logic also obtains reference images by connecting to the image management platform. It can obtain a full-screen screenshot of the current screen and perform a fuzzy match between it and a preset full-screen rendering effect image to determine whether the similarity meets the standard. After all business processes and assertion steps are completed, the system closes the browser or mobile app, automatically collects execution logs and screenshot evidence, generates a detailed test report, and sends the report to designated personnel via email.

[0037] In this embodiment, a visual resource management platform is constructed, addressing the pain points of strong coupling and difficulty in maintaining image resources and code in traditional automated testing, significantly reducing the human cost of script writing. Simultaneously, a multi-level location degradation strategy ensures that the test process can continue running even when image recognition fails, greatly improving the robustness of automated testing. Furthermore, the introduction of a Redis-based intelligent dynamic caching strategy and an FTP connection pool concurrency mechanism significantly improves resource loading speed and execution efficiency under large-scale concurrent testing.

[0038] In one embodiment, after the step of using image recognition technology to perform image processing and matching between the reference image and a real-time captured screen image to locate the target element, the method further includes:

[0039] The historical matching confidence scores of the benchmark image in a preset number of test tasks are statistically analyzed, and the mean and variance of the confidence scores are calculated based on the historical matching confidence scores.

[0040] Based on the mean confidence level and the variance of confidence level, the matching threshold of the benchmark image in subsequent testing processes is dynamically adjusted so that the matching threshold is adaptively adjusted according to the device.

[0041] Specifically, to further enhance the adaptability and accuracy of image recognition, the system introduces a dynamic threshold adjustment mechanism based on historical data. After a single image recognition and localization operation is completed, the system automatically writes the actual confidence score calculated using OpenCV template matching into the log database, archiving it in the historical record of the reference image. The system periodically, or after each execution, reads the historical matching records of the reference image within a preset period (e.g., the last 30 executions) and uses statistical algorithms to calculate the arithmetic mean and variance of these confidence scores. If the calculation results show that the historical matching accuracy of the image is high and the variance is extremely small, it indicates that the rendering performance of the UI element is very stable under different test devices and environments. The system will automatically update the configuration file and increase the matching threshold parameter of the image in the next recognition (for example, set to the mean minus a small safety margin) to strictly filter out interference elements with similar backgrounds and reduce the false alarm rate. Conversely, if the calculated variance is large, it indicates that the element is easily affected by the rendering anti-aliasing algorithm or resolution scaling caused by different devices, resulting in pixel differences. The system will appropriately lower the matching threshold in the next recognition to ensure the recognition pass rate while tolerating a certain amount of image noise, thereby realizing intelligent parameter self-calibration for each reference image.

[0042] Furthermore, in one embodiment, dynamically adjusting the matching threshold of the benchmark image in subsequent testing processes based on the mean confidence level and the variance of the confidence level includes:

[0043] The confidence standard deviation is calculated based on the confidence variance, and the confidence standard deviation is multiplied by a preset sensitivity coefficient to obtain the dynamic offset factor;

[0044] Subtract the dynamic offset factor from the mean confidence level to generate a preliminary adjustment threshold;

[0045] Obtain a preset global threshold boundary. If the preliminary adjustment threshold exceeds the global threshold boundary, then use the global threshold boundary to truncate the preliminary adjustment threshold to obtain the target matching threshold.

[0046] Specifically, the system first performs a square root operation on the stored historical matching confidence variance to obtain the confidence standard deviation, which reflects the degree of data dispersion. Then, the system introduces a preset sensitivity coefficient (e.g., set to 2.0 or 3.0, representing a tolerance standard deviation multiple), multiplies this sensitivity coefficient by the confidence standard deviation, and calculates a dynamic offset factor. This offset factor dynamically quantifies the "jitter" range of the current image under different environments. Next, the system subtracts this dynamic offset factor from the historical confidence mean to generate a preliminary adjustment threshold. This calculation formula (i.e....) This ensures that the generated threshold automatically covers the vast majority of historical, normal matching fluctuations. Finally, to prevent threshold runaway due to extreme anomalies in sample data, the system acquires preset global threshold boundaries (e.g., lower limit 0.70, upper limit 0.99). If the calculated initial adjustment threshold exceeds the range of this global boundary, the system will forcibly truncate the threshold using the corresponding boundary value, and determine the final processed value as the target matching threshold for the baseline image in subsequent testing processes.

[0047] In this embodiment, by introducing a dynamic threshold self-calibration mechanism based on historical statistical data, the robustness and adaptability of the UI automated testing system in complex rendering environments are significantly improved. This mechanism breaks through the limitations of a single fixed threshold in traditional image recognition. By calculating the mean and variance of the confidence score, it can accurately quantify the rendering stability of different UI elements in multiple rounds of testing and intelligently adjust the judgment criteria for the next recognition accordingly. This dynamic strategy not only effectively solves the problem of recognition failure caused by rendering noise and greatly reduces the false alarm rate and false negative rate of the script, but also eliminates the tedious work of testers manually fine-tuning the threshold of each image, greatly reducing the maintenance cost and manpower investment of the automated test script.

[0048] In one embodiment, the method further includes:

[0049] The success rate of positioning using image recognition technology within a preset period is statistically analyzed. If the success rate is lower than a preset health threshold, the benchmark image is marked as needing optimization, and a benchmark image replacement prompt is generated on the visualization resource management platform.

[0050] Specifically, during the continuous integration process of automated testing, the system records the execution status (success or failure) of each image recognition operation in the background. When a preset statistical period is reached (e.g., every 50 tests or one week), the background algorithm automatically calculates the recognition success rate of each benchmark image within that period. If the calculated success rate is lower than a preset health threshold (e.g., 60%), the system logic determines that the benchmark image may no longer be applicable due to product UI iteration or poor cropping quality, and marks it as "to be optimized" in the database. Subsequently, the system triggers a notification service, generating a prominent replacement prompt or highlight mark for the image on the interface of the visual resource management platform, proactively reminding test maintenance personnel to promptly re-collect and replace the benchmark image to prevent low-quality image resources from continuously dragging down the overall test pass rate.

[0051] In one embodiment, the method further includes:

[0052] After successful positioning, the matching coordinate area of ​​the reference image in the current screen is recorded; the matching coordinate area in multiple consecutive test versions is compared, and if the calculated coordinate offset exceeds the preset drift threshold, a UI layout change warning log is generated.

[0053] Specifically, whenever an image is successfully recognized and the target element is located, the system not only extracts the center point for interaction but also records the coordinates of the top-left corner and the length and width of the matching area of ​​the reference image in the current screen. The system compares and analyzes the coordinate data obtained in this test with historical coordinate data recorded in multiple past test versions (e.g., V1.0 to V1.2), calculating the Euclidean distance or relative offset of the coordinate changes. If the calculated coordinate offset exceeds a preset drift threshold, it indicates that although the image can still be recognized, its position in the page layout has changed significantly (possibly due to unexpected UI changes or CSS style errors). At this time, the system automatically generates a UI layout change warning log, along with coordinate comparison data before and after the offset, so that developers can quickly identify potential layout defects.

[0054] In one embodiment, the method further includes:

[0055] Record the last successful match time for each reference image; if the unmatched duration of a reference image exceeds a preset lifecycle threshold, generate resource cleanup suggestions.

[0056] Specifically, the database table structure maintains a timestamp field for the last successful match. Whenever a baseline image is successfully identified and used in automated testing, the system immediately updates this field to the current system time. The system periodically runs resource inspection tasks, calculating the time difference between the current time and this field to determine the duration of the image's unmatched period. If the unmatched period of a baseline image exceeds a preset lifecycle threshold (e.g., 90 days, meaning the function module may have been taken offline or deprecated), the system generates a resource cleanup suggestion list and pushes it to the administrator on the management platform, recommending the removal of these "zombie" image resources from storage space and cache.

[0057] In the above embodiments, by monitoring the recognition success rate and prompting for the replacement of low-quality images, the monitoring of test resources is realized, ensuring that the benchmark image is always in the best usable state and avoiding false negatives caused by outdated images. By monitoring coordinate drift, automated testing is not limited to the verification of functional logic, but can also keenly capture subtle UI layout anomalies, playing the role of UI regression testing. The lifecycle-based resource cleanup mechanism effectively solves the problem of abandoned image accumulation caused by project iteration, frees up server storage space, and reduces traversal overhead during retrieval, thereby ensuring the lightweight and efficient operation of the visual resource management platform.

[0058] In one embodiment, after the step of initiating the automated process and capturing the screen of the device under test in real time, the method further includes:

[0059] The reference image to be processed is retrieved from the remote server using a remote transmission protocol; during the retrieval process, image data is obtained through a collaborative mechanism between local caching and the remote server.

[0060] The coordination mechanism includes:

[0061] The system monitors the update status of images in real time. If the update time interval is less than a preset high-frequency threshold, the effective duration of the image in the local cache is shortened. If it is greater than a preset low-frequency threshold, the effective duration of the image in the local cache is extended.

[0062] Specifically, once the automated testing process starts and the real-time screen capture step is executed, the system immediately triggers the benchmark image retrieval program. This program establishes a communication link with the remote server via a remote transmission protocol (such as FTP), but when actually acquiring data, it prioritizes the mechanism of local caching and remote server collaboration. Specifically, the system uses Redis as a local high-performance cache database to store image data in binary format. When the test code initiates an image request, the system first queries the Redis cache; if the cache is not found, it downloads the corresponding image from the remote server using a pre-established persistent FTP connection pool and employs a load balancing strategy, writing it to the local cache. Simultaneously, it supports multi-threaded asynchronous concurrent requests to address network congestion issues in large-scale concurrent testing scenarios.

[0063] To manage cache validity, this embodiment constructs a dynamic expiration strategy based on update frequency. The system backend monitors the metadata update status of the cloud-based benchmark image in real time and calculates the difference between the current time and the last image update time. If the update interval is less than a preset high-frequency threshold (e.g., 1 hour), the system infers that the image is in a period of frequent iteration (e.g., the development phase) and automatically shortens the image's validity period in the local cache (e.g., set to 5 minutes) to ensure that the test end can obtain the latest version in a timely manner. Conversely, if the update interval is greater than a preset low-frequency threshold, the system determines that the image is becoming stable and extends its validity period in the local cache accordingly (e.g., set to 1 day). In addition, to prevent extreme cases, the system also sets upper and lower limits for the time window (e.g., a minimum of no less than 1 minute and a maximum of no more than 30 days) to achieve an optimal balance between data freshness and server load.

[0064] In one embodiment, the collaborative mechanism further includes: calculating a weight value based on the access frequency of the reference image within a preset time window, and removing the reference image from the local cache when the weight value is lower than a preset elimination threshold, even if the effective duration has not been reached.

[0065] Specifically, to further optimize the utilization of local storage space, the collaborative mechanism also introduces a weighted eviction strategy based on a time decay algorithm. The system counts the access frequency of each benchmark image within a preset time window and calculates the real-time popularity weight value of the image based on the time decay factor. As the inaccessible time increases, this weight value decreases non-linearly according to the algorithm's rules. When the system detects that the weight value of a benchmark image is lower than a preset eviction threshold, even if the image's current cache validity period has not expired, the system will forcibly perform a cleanup operation, immediately removing the benchmark image from the Redis local cache, thereby freeing up storage space for newly generated or frequently accessed images.

[0066] In the above embodiments, by constructing a collaborative mechanism of "local caching + remote synchronization," the loading speed of image resources in automated testing is greatly improved, and the network bandwidth pressure on remote servers is reduced. In particular, the dynamic expiration strategy based on update frequency resolves the contradiction between cache data consistency and persistence: "fast in, fast out" is implemented for images with high-frequency iterations, ensuring the real-time accuracy of test data and avoiding false negatives caused by cache lag; "long-term residence" is implemented for stable images, reducing redundant downloads. Furthermore, the introduction of a time-decay-based weighted elimination mechanism enables self-purification and intelligent management of cache space, ensuring that limited local storage resources always serve high-demand core business data, thereby significantly improving the throughput and operating efficiency of the overall testing system.

[0067] Furthermore, in one embodiment, the method of acquiring image data using a collaborative mechanism of local caching and a remote server during the retrieval process further includes:

[0068] The system automatically monitors the version update status of the product under test. When a version update is detected, it automatically triggers the re-acquisition process of the remote transmission protocol and forces a refresh of the corresponding image data in the local cache.

[0069] Specifically, a version monitoring daemon runs in the background, which reads the version metadata of the product under test (such as a mobile app installation package or web application configuration) in real time or periodically. When the monitoring process detects a change in the version number of the product under test (e.g., from V1.0 to V1.1), the system determines that the UI interface has likely undergone substantial changes, and the existing cached data is at risk of becoming invalid. Therefore, the system immediately triggers a forced synchronization command, bypassing the regular cache lookup logic and automatically activating the remote transmission protocol's re-acquisition process. The system initiates a full or incremental download request for the current new version's image set to the remote server, and directly writes the latest benchmark images downloaded back to the local Redis cache, forcibly overwriting the old version's binary data, while simultaneously resetting the cache lifecycle of this batch of images. This process is completely automated during the preparation phase before automated test execution, without requiring manual intervention to clear the cache.

[0070] In this embodiment, the version linkage mechanism enables the updating of cached data, ensuring that the benchmark image used in each round of automated testing strictly corresponds to the current version of the product under test, guaranteeing the data consistency of the testing environment. It also eliminates the tedious operation of testers manually clearing the local cache after release, further improving the intelligence level and execution reliability of the automated testing process.

[0071] In one embodiment, retrieving the reference image to be processed from a remote server via a remote transmission protocol further includes:

[0072] Establish a persistent connection pool pointing to the remote server and configure a load balancing mechanism;

[0073] When a concurrent image retrieval request is received, an asynchronous concurrent processing mode is adopted, and the request is allocated to an idle connection in the persistent connection pool according to the number of requests currently being processed and the connection response time.

[0074] Specifically, the system initializes and establishes a persistent connection pool pointing to the remote server at the initial startup. Considering the stability of image resource transmission, this embodiment specifically adopts the FTP protocol as the underlying transmission method, pre-creating and maintaining a certain number (e.g., 10 to 20) of long connection handles to form a resource reuse pool. When the automated testing framework initiates concurrent image retrieval requests in a multi-threaded environment, the system no longer executes the cumbersome "establish connection-transfer-disconnect" process for each request individually, but instead enables an asynchronous concurrent processing mode. In this mode, the system's built-in load balancer monitors the health status and load indicators of each connection handle in the connection pool in real time, specifically calculating based on the number of currently processed backlogged requests and the response time (RT) of the last data transmission of that connection. The scheduling algorithm prioritizes allocating newly arrived image download tasks to the idle connection with the lowest current load and fastest response speed, achieving dynamic task distribution. If all connections are busy, the system adds the request to an asynchronous waiting queue, and performs callback processing immediately once a connection is released, thereby achieving efficient data throughput without blocking the main test thread.

[0075] In this embodiment, by reusing persistent connections, the time spent establishing connections is significantly reduced. Simultaneously, a two-dimensional load balancing strategy based on "request quantity + response time" effectively prevents a single link from becoming a performance bottleneck due to network fluctuations or large data packet transmissions, ensuring a uniform distribution of traffic within the connection pool. Combined with asynchronous concurrency mode, this allows UI automated testing to maintain smooth operation even when downloading large amounts of benchmark images, significantly improving the overall execution speed and stability of the testing process.

[0076] In one embodiment, prior to the step of performing image processing and matching between the reference image and the real-time captured screen image using image recognition technology, the method further includes:

[0077] The screen resolution characteristics of the device under test are obtained, and an adaptive scaling algorithm is run to dynamically adjust the size of the reference image according to the size ratio of the screen resolution characteristics to the reference image in order to adapt to the screen specifications of the current device under test.

[0078] Image preprocessing is performed on the adjusted baseline image and the real-time captured screen image to unify image features, followed by image matching.

[0079] Specifically, before performing image matching, the system employs a rigorous adaptation and preprocessing process. First, the system reads the hardware configuration information of the device under test (DUT) via ADB (Android Debug Bridge) or the WebDriver interface to obtain screen resolution characteristics (such as width, height, and pixel density DPI). Then, the system runs a built-in adaptive scaling algorithm to compare the screen specifications of the DUT with the original size data of the reference image, calculating the horizontal and vertical scaling coefficients. Based on these coefficients and image content characteristics, the system dynamically adjusts the size of the reference image using bilinear or bicubic interpolation algorithms to precisely align its pixel matrix dimensions with the screen specifications of the DUT. After completing the size adaptation, in order to further eliminate rendering differences, the system performs image preprocessing on the adjusted reference image and the real-time captured screen image simultaneously to unify image features. Specifically, the system calls the OpenCV image processing library to perform composite image enhancement processing on the reference image and the real-time captured screen image simultaneously. This includes converting the two images to grayscale to eliminate interference caused by differences in color modes, using an adaptive algorithm to enhance contrast to highlight features, performing edge detection to extract contours, and applying a Gaussian blur algorithm to remove image noise, thereby constructing image data with distinct features and uniformity for subsequent matching.

[0080] In this embodiment, a preprocessing mechanism combining "multi-resolution adaptive processing + composite image enhancement" effectively solves the compatibility problem of automated testing in multi-device environments. The multi-resolution adaptive strategy dynamically adjusts the size based on screen features and image content, allowing the same set of benchmark images to adapt to terminals with different resolutions without repeated acquisition. Meanwhile, the image preprocessing scheme, which integrates contrast enhancement, edge detection, and Gaussian blur, specifically strengthens key image features and suppresses environmental noise, significantly improving the feature extraction capability and matching accuracy of image recognition algorithms in complex backgrounds and under different rendering engines.

[0081] In one embodiment, the assertion judgment based on the feedback interface specifically includes:

[0082] Connect to the visualization resource management platform, obtain the storage path of the benchmark image required for the assertion, and load the benchmark image;

[0083] Based on the requirements of the test scenario, execute at least one of the following assertion types:

[0084] Full-image comparison assertion: The feedback interface is compared with the loaded full-screen reference image to determine whether the comparison matching degree between the two is within a preset threshold range.

[0085] Region element detection: Capture a local region image from the feedback interface and determine whether the local region image exists in the loaded full-screen reference image.

[0086] Specifically, when the automated testing process enters the final verification stage, the testing system first establishes a communication connection with the visual resource management platform through the API interface, and indexes and downloads the corresponding "full-screen benchmark image" (i.e. the preset standard rendering effect image) to local memory based on the pre-configured assertion parameters (such as the benchmark image ID or name) in the current test case. Subsequently, the system performs traffic splitting based on the assertion type defined in the test script: If configured as a full-image comparison assertion, the system performs a global image comparison between the real-time acquired "feedback interface" (full-screen screenshot) and the loaded "full-screen baseline image," using histogram comparison or structural similarity (SSIM) algorithms to calculate the overall matching degree. The test is considered passed only when the matching degree value is within a preset strict threshold range (e.g., above 95%), thus verifying the consistency between the current page layout and the UI design. If configured as a region element detection, the system extracts a local area image from the real-time "feedback interface" based on preset coordinates. Then, it uses this local area image as a search template to traverse and search for matching in the loaded "full-screen baseline image." If the local image is successfully retrieved in the full-screen baseline image and the confidence level meets the standard, the key element is considered to be displayed correctly.

[0087] This embodiment provides a flexible and multi-dimensional UI automation verification mechanism, effectively solving the problem that a single assertion method cannot simultaneously address both global layout and local details. Full-image comparison assertions can quickly verify the integrity and layout correctness of the overall page rendering from a macro perspective, suitable for consistency checks on static pages. Meanwhile, region element detection, through a "reverse lookup" logic, verifies whether local elements on the current screen truly exist in the standard design drawing, achieving precise verification of key controls (such as icons and buttons) in dynamic pages. This eliminates interference from changes in other irrelevant areas of the page (such as dynamic advertisements and timestamps) on the test results, thereby significantly improving the accuracy and scenario coverage of the automated test assertion process.

[0088] The steps of the various methods described above are only for clarity. In practice, they can be combined into one step or some steps can be split into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this application. Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but without changing the core design of the algorithm and process, are also within the scope of protection of this application.

[0089] Furthermore, some embodiments of this application also provide an electronic device. The electronic device can be various forms of digital computer, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, etc. The electronic device can also be various forms of mobile devices, such as cellular phones, smartphones, wearable devices, and other similar computing devices.

[0090] The electronic device includes: one or more processors; and a memory storing computer program instructions that, when executed, cause the processor to perform the steps of the methods provided in any one or more of the above embodiments. Figure 2 An exemplary structural diagram of the electronic device is disclosed. The electronic device includes one or more processors 1101, a memory 1102, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The components are interconnected via different buses and can be mounted on a common motherboard or otherwise installed as needed. The processors can process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of a GUI on an external input / output device (such as a display device coupled to the interface). In some other embodiments, multiple processors and / or multiple buses can be used with multiple memories and multiple memory modules, if desired. Similarly, multiple electronic devices can be connected, each providing some of the necessary operations. The components, their connections and relationships, and their functions shown herein are merely examples and are not intended to limit the implementation of the present application described and / or claimed herein.

[0091] The electronic device may further include an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103 and output device 1104 may be connected by a bus or other means, as shown in the figure, which is connected by a bus.

[0092] Input device 1103 can receive input numerical or character information, and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, joystick, one or more mouse buttons, trackball, joystick, etc. Output device 1104 may include a display device, auxiliary lighting device (e.g., LED), and haptic feedback device (e.g., vibration motor). The display device may include, but is not limited to, a liquid crystal display, a light-emitting diode display, and a plasma display. In some embodiments, the display device may be a touch screen.

[0093] To provide interaction with the user, the electronic device can be a computer. The computer has: a display device (e.g., a cathode ray tube or LCD monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback); and input from the user can be received in any form (e.g., voice input or tactile input).

[0094] In this embodiment, a computer-readable medium stores a computer program / instructions that, when executed by a processor, implement the steps of the methods provided in any one or more of the above embodiments. This computer-readable medium may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into that device. The aforementioned computer-readable medium carries one or more computer-readable instructions.

[0095] The memory 1102 can serve as a non-transitory computer-readable storage medium, used to store non-transitory software programs, non-transitory computer-executable programs, and modules. The processor 1101 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions, and modules stored in the memory 1102, thereby implementing the program instructions / modules corresponding to the methods provided in any one or more of the embodiments described above in this application.

[0096] The memory 1102 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created based on the use of the electronic device. Furthermore, the memory 1102 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1102 may optionally include memory remotely located relative to the processor 1101, and these remote memories can be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0097] It should be noted that the computer-readable medium described in this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. Computer-readable media can be, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory, read-only memory, erasable programmable read-only memory, optical fibers, portable compact disk read-only memory, optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, a computer-readable medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0098] Computer-readable media include permanent and non-permanent, removable and non-removable media, which can store information by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory, static random access memory, dynamic random access memory, other types of random access memory, read-only memory, electrically erasable programmable read-only memory, flash memory or other memory technologies, read-only optical discs, digital versatile optical discs or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0099] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as C or similar languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including local area networks (LANs) or wide area networks (WANs), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0100] In the above embodiments, all or part of the implementation can be achieved through software, hardware, firmware, or any combination thereof. For example, it can be implemented using an application-specific integrated circuit (ASIC), a general-purpose computer, or any other similar hardware device. In some embodiments, the software program of this application can be executed by a processor to implement the above steps or functions. Similarly, the software program of this application (including related data structures) can be stored in a computer-readable recording medium, such as RAM memory, magnetic or optical drives, floppy disks, and similar devices. In addition, some steps or functions of this application can be implemented in hardware, for example, as circuitry that cooperates with a processor to perform the various steps or functions.

[0101] The computer program product provided in this application includes one or more computer programs / instructions. When executed by a processor, these computer programs / instructions generate, in whole or in part, the processes or functions described in this application. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive), etc.

[0102] The flowcharts or block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of devices, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-specific system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0103] The scope of this application is defined by the appended claims rather than the foregoing description, and is therefore intended to encompass all variations falling within the meaning and scope of equivalents of the claims. No reference numerals in the claims should be construed as limiting the scope of the claims. Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in a device claim may also be implemented by a single unit or device in software or hardware. Terms such as "first," "second," etc., are used only for distinguishing descriptions and do not indicate any particular order, nor should they be construed as indicating or implying relative importance.

[0104] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily made by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims, and the above embodiments should be regarded as exemplary and non-limiting.

Claims

1. A UI automated testing method based on test images, characterized in that, The method includes: Screenshots of various test scenarios in the product under test are collected as reference images. A unique image name and image description are configured for the reference images in combination with the module hierarchy of the product under test, and an image set management list is generated on the visual resource management platform. Initiate the automated testing process, call the image set management list through test code, and capture the screen of the device under test in real time; Image recognition technology is used to process and match the reference image with real-time captured screen images in order to locate the target element; When the target element is not successfully located using the image recognition technology, the system switches to the element location method based on the control tree or the location method based on character recognition to continue locating the target element, thereby obtaining the position coordinates of the target element on the screen. Based on the location coordinates, perform a preset test interaction operation on the device under test and obtain the feedback interface after the interaction; Assertions are made based on the feedback interface, and a test report is generated after the automated process is completed.

2. The method according to claim 1, characterized in that, After the step of using image recognition technology to perform image processing and matching between the reference image and the real-time captured screen image to locate the target element, the method further includes: The historical matching confidence scores of the benchmark image in a preset number of test tasks are statistically analyzed, and the mean and variance of the confidence scores are calculated based on the historical matching confidence scores. Based on the mean confidence level and the variance of confidence level, the matching threshold of the benchmark image in subsequent testing processes is dynamically adjusted so that the matching threshold is adaptively adjusted according to the device.

3. The method according to claim 1, characterized in that, The method further includes: The success rate of positioning using image recognition technology within a preset period is statistically analyzed. If the success rate is lower than a preset health threshold, the benchmark image is marked as needing optimization, and a benchmark image replacement prompt is generated on the visual resource management platform. and / or; After successful positioning, the matching coordinate area of ​​the reference image in the current screen is recorded; the matching coordinate area in multiple consecutive test versions is compared, and if the calculated coordinate offset exceeds the preset drift threshold, a UI layout change warning log is generated. and / or; Record the last successful match time for each reference image; if the unmatched duration of a reference image exceeds a preset lifecycle threshold, generate resource cleanup suggestions.

4. The method according to claim 1, characterized in that, After the step of initiating the automated process and capturing the screen of the device under test in real time, the method further includes: The reference image to be processed is retrieved from the remote server using a remote transmission protocol; during the retrieval process, image data is obtained through a collaborative mechanism between local caching and the remote server. The coordination mechanism includes: The update status of the image is monitored in real time. If the update time interval is less than a preset high-frequency threshold, the effective time of the image in the local cache is shortened. If it is greater than a preset low-frequency threshold, the effective time of the image in the local cache is extended. and / or; A weight value is calculated based on the access frequency of the reference image within a preset time window. When the weight value is lower than a preset elimination threshold, the reference image is removed from the local cache even if the effective duration has not been reached.

5. The method according to claim 4, characterized in that, The step of retrieving the reference image to be processed from the remote server via a remote transmission protocol also includes: Establish a persistent connection pool pointing to the remote server and configure a load balancing mechanism; When a concurrent image retrieval request is received, an asynchronous concurrent processing mode is adopted, and the request is allocated to an idle connection in the persistent connection pool according to the number of requests currently being processed and the connection response time.

6. The method according to claim 1, characterized in that, Before the step of using image recognition technology to process and match the reference image with the real-time captured screen image, the method further includes: The screen resolution characteristics of the device under test are obtained, and an adaptive scaling algorithm is run to dynamically adjust the size of the reference image according to the size ratio of the screen resolution characteristics to the reference image in order to adapt to the screen specifications of the current device under test. Image preprocessing is performed on the adjusted baseline image and the real-time captured screen image to unify image features, followed by image matching.

7. The method according to claim 1, characterized in that, The assertion judgment based on the feedback interface specifically includes: Connect to the visualization resource management platform, obtain the storage path of the benchmark image required for the assertion, and load the benchmark image; Based on the requirements of the test scenario, execute at least one of the following assertion types: Full-image comparison assertion: The feedback interface is compared with the loaded full-screen reference image to determine whether the comparison matching degree between the two is within a preset threshold range. Region element detection: Capture a local region image from the feedback interface and determine whether the local region image exists in the loaded full-screen reference image.

8. An electronic device, characterized in that, The electronic device includes: One or more processors; and A memory storing computer program instructions, which, when executed, cause the processor to perform the steps of the method as described in any one of claims 1 to 7.

9. A computer-readable medium having a computer program / instructions stored thereon, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 7.

10. A computer program product comprising a computer program / instructions, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 7.