Control position recognition method and device, electronic equipment and readable storage medium

By combining optical character recognition and image processing, the positions of controls in the screenshot are filtered and clustered, solving the problems of low accuracy and poor applicability of control recognition, and realizing efficient and low-cost control position recognition.

CN118672695BActive Publication Date: 2026-06-26TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2023-03-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing control recognition methods suffer from low recognition accuracy, high labor costs, high computing power requirements, and poor applicability, especially in applications where control iteration and updates are rapid, limiting the recognition range.

Method used

By using optical character recognition (OCR) and binarized image processing, combined with the clustering method of the recognition box, the positions of controls in the interface screenshot are filtered out, reducing manual annotation and computing power consumption, and adapting to control updates.

Benefits of technology

It improves the accuracy of control position recognition, reduces labor costs and computing power consumption, is applicable to various platforms and software types, and reduces the need to update recognition templates.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118672695B_ABST
    Figure CN118672695B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a control position recognition method and device, electronic equipment and readable storage medium, and relate to the technical field of testing. The method comprises: obtaining an interface screenshot of an application program; performing optical character recognition on the interface screenshot to obtain character regions of each character in the interface screenshot, and determining a character recognition box corresponding to each character region on the interface screenshot; converting the interface screenshot into a binary image, performing contour division on the binary image, obtaining at least one contour region, and determining a contour recognition box corresponding to each contour region; clustering each character recognition box and contour recognition box based on distance, for a cluster comprising multiple recognition boxes, merging all recognition boxes in the cluster into a target recognition box, and taking a region corresponding to the target recognition box in each cluster as a position of each control in the interface. The embodiments of the present application reduce the labor cost and computing power consumption of control position recognition, and improve the accuracy and applicability of recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of testing technology, and more specifically, to a method, apparatus, electronic device, and readable storage medium for identifying the position of a control. Background Technology

[0002] Current application testing solutions typically require first obtaining the clickable controls of the application, and then inputting the information of the clickable controls into the testing system to achieve application testing.

[0003] In related technologies, the clickability of controls is determined by comprehensively analyzing the attributes of each control. However, this control recognition method requires injecting test code into the game and using the engine's interface to obtain control information. Therefore, this method is only suitable for games developed using common engines. Alternatively, a combination of graphics and machine learning, along with manual annotation, can be used to identify controls in the game. However, the accuracy of these methods relies heavily on large amounts of data annotation and computation, requiring significant upfront investment of manpower. Furthermore, control recognition is limited to annotated controls, restricting the scope of recognition and resulting in poor applicability. Additionally, for games with rapid control updates, frequent updates to the control data are necessary. Therefore, these technologies suffer from low accuracy, high labor costs, high computational requirements, and poor applicability. Summary of the Invention

[0004] This application provides a method, apparatus, electronic device, and readable storage medium for identifying the position of a control, solving the problems of high manual costs, high computational power consumption, poor applicability, and low accuracy in the control position identification process. The technical solution is as follows:

[0005] According to a first aspect of the embodiments of this application, a method for identifying the position of a control is provided, the method comprising:

[0006] Obtain screenshots of the application's interface;

[0007] Optical character OCR recognition is performed on the screenshot of the interface to obtain the character region of each character in the screenshot of the interface, and a character recognition box corresponding to each character region is determined on the screenshot of the interface.

[0008] The screenshot is converted into a binary image, the binary image is segmented into contours to obtain at least one contour region, and a contour recognition box corresponding to each contour region is determined on the screenshot.

[0009] Each character recognition box and contour recognition box is clustered based on the distance between the recognition boxes. For a cluster containing multiple recognition boxes, all recognition boxes in the cluster are merged into a single target recognition box. The area corresponding to the target recognition box in each cluster is used as the location of each control in the interface.

[0010] According to a second aspect of the embodiments of this application, a control position identification device is provided, the device comprising: an acquisition module, configured to acquire a screenshot of the interface of an application;

[0011] The character recognition module is used to perform optical character OCR recognition on the screenshot of the interface, obtain the character area of ​​each character in the screenshot of the interface, and determine a character recognition box on the screenshot that corresponds one-to-one with each character area.

[0012] The contour recognition module is used to convert the screenshot into a binary image, perform contour division on the binary image to obtain at least one contour region, and determine a contour recognition box corresponding to each contour region on the screenshot.

[0013] The merging module is used to cluster the character recognition boxes and contour recognition boxes based on the distance between the recognition boxes. For a cluster containing multiple recognition boxes, all recognition boxes in the cluster are merged into a single target recognition box. The area corresponding to the target recognition box in each cluster is used as the location of each control in the interface.

[0014] As an optional implementation, the terminal further includes: a first contour recognition frame filtering module;

[0015] The first contour recognition box filtering module is used to delete each contour recognition box if it is determined that the size of the contour recognition box does not meet the preset size limit conditions.

[0016] As an optional implementation, the contour recognition module includes: a vertex extraction unit, used to extract vertex information of the contour from the contour region for each contour region, and determine a contour recognition box corresponding to the contour region based on the vertex information of the contour region.

[0017] As an optional implementation, the size limiting conditions include at least one of the following:

[0018] The area of ​​the contour recognition box is between the first area threshold and the second area threshold;

[0019] The ratio of the width to the height of the contour recognition box is less than the proportional threshold.

[0020] As an optional implementation, the terminal also includes: a second contour recognition frame filtering module;

[0021] The second contour recognition box filtering module is used to sort the contour recognition boxes in ascending order of area.

[0022] Determine in order whether each contour recognition box is completely covered by the subsequent contour recognition box, and mark the completely covered contour recognition box as the covered contour recognition box.

[0023] After identifying all bounding boxes, delete any bounding boxes that were covered.

[0024] As an optional implementation, the contour recognition box module includes:

[0025] A conversion unit is used to convert the screenshot of the interface into a grayscale image;

[0026] The first setting unit is used to set the pixel values ​​in the grayscale image that are greater than a pixel threshold as a first value.

[0027] The second setting unit is used to set the pixel values ​​in the grayscale image that are less than a pixel threshold to a second value.

[0028] The acquisition unit is used to obtain the binarized image;

[0029] The pixel threshold is determined based on the pixel value of each character.

[0030] As an optional implementation method, the method for determining the pixel threshold includes:

[0031] Determine the pixel value of each character;

[0032] Determine the average pixel value of each character based on its pixel value;

[0033] The average pixel value of the character is used as the pixel threshold.

[0034] As an optional implementation, the terminal also includes: a character recognition box filtering module;

[0035] Character recognition box filtering module, used for

[0036] Traverse all character recognition boxes. For any two character recognition boxes, if the two character recognition boxes have overlapping areas, then merge the two character recognition boxes into one character recognition box.

[0037] For any character recognition box after traversal, if it is determined that the number of characters in the character recognition box exceeds the character count threshold, then the character recognition box is deleted.

[0038] As an optional implementation, the terminal also includes: an input module;

[0039] The input module is used to input the location of each control in the application interface into the automated test program, so that the automated test program can test the application.

[0040] According to a third aspect of the present application, an electronic device is provided, the electronic device including a memory, a processor and a computer program stored in the memory, wherein the processor executes the program to implement the steps of the control position recognition method provided in the first aspect.

[0041] According to a fourth aspect of the present application, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of the control position recognition method provided in the first aspect.

[0042] According to a fifth aspect of the present application, a computer program product is provided, the computer program product including computer instructions stored in a computer-readable storage medium, wherein when a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, causing the computer device to perform steps implementing the control position recognition method as provided in the first aspect.

[0043] The beneficial effects of the technical solutions provided in this application are:

[0044] This application embodiment utilizes an optical character recognition method to effectively capture character recognition boxes on a screenshot, providing a more accurate selection for control position identification. By performing threshold recognition on the screenshot—that is, binarization and contour segmentation—the contour regions on the screenshot can be effectively distinguished, thus obtaining contour recognition boxes. By filtering the character recognition boxes and contour recognition boxes obtained by both optical character recognition and threshold recognition methods, controls on the screenshot are filtered from multiple dimensions. Furthermore, based on the distance between the recognition boxes, the character recognition boxes and contour recognition boxes are clustered, allowing recognition boxes that are too close or overlap to be recognized as a single recognition box, thereby further improving the accuracy of the identification. The method filters the positions of controls to obtain their positions on the interface. By using optical character recognition and threshold recognition, the accuracy of control position recognition is greatly improved. Furthermore, it eliminates the need for manual annotation of a large amount of data on the controls in the screenshot. While ensuring the accuracy of control position recognition, it greatly reduces the manual cost and computing power required for control recognition. At the same time, the control identification method of this application does not require the use of a fixed recognition template. For applications that frequently update controls, there is no need to frequently update the recognition template. Moreover, it is not limited by application type or software operating platform, which greatly improves the applicability of the control position recognition method. Attached Figure Description

[0045] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below.

[0046] Figure 1 A schematic diagram of the implementation environment provided for the embodiments of this application;

[0047] Figure 2 A schematic diagram of the system architecture provided for the implementation of the embodiments of this application;

[0048] Figure 3 A flowchart illustrating a method for identifying the position of a control provided in an embodiment of this application;

[0049] Figure 4 A schematic diagram illustrating the OCR recognition of a game interface screenshot provided in this application embodiment;

[0050] Figure 5 This is a schematic diagram showing overlapping areas in the character recognition frames provided in the embodiments of this application;

[0051] Figure 6 A schematic diagram illustrating a character recognition box filtering method provided in an embodiment of this application;

[0052] Figure 7 A schematic diagram illustrating the distance between recognition frames provided in an embodiment of this application;

[0053] Figure 8a A schematic diagram of a character recognition box and a contour recognition box before clustering, provided in an embodiment of this application;

[0054] Figure 8b A schematic diagram showing the clustering of a character recognition box and a contour recognition box provided in an embodiment of this application;

[0055] Figure 9a This is a schematic diagram illustrating the process of filtering a contour recognition box, as provided in an embodiment of this application.

[0056] Figure 9b This is a schematic diagram illustrating the filtering of a contour recognition box as provided in an embodiment of this application.

[0057] Figure 10 This is a schematic diagram showing the position of a contour recognition box in an interface, provided as an embodiment of this application.

[0058] Figure 11 An interactive flowchart illustrating the control position recognition method provided in this application embodiment;

[0059] Figure 12 A schematic diagram of the structure of an apparatus provided in an embodiment of this application;

[0060] Figure 13 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0061] The embodiments of this application are described below with reference to the accompanying drawings. It should be understood that the embodiments described below with reference to the accompanying drawings are exemplary descriptions for explaining the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions of the embodiments of this application.

[0062] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the terms “comprising” and “including” as used in embodiments of this application mean that the corresponding feature can be implemented as the presented feature, information, data, step, operation, element, and / or component, but do not exclude implementation as other features, information, data, step, operation, element, component, and / or combinations thereof supported by the art. It should be understood that when we say that an element is “connected” or “coupled” to another element, the one element can be directly connected or coupled to the other element, or it can mean that the one element and the other element establish a connection relationship through an intermediate element. Furthermore, “connected” or “coupled” as used herein can include wireless connection or wireless coupling. The term “and / or” as used herein indicates at least one of the items defined by the term; for example, “A and / or B” can be implemented as “A,” or as “B,” or as “A and B.”

[0063] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0064] First, let's introduce and explain several terms used in this application:

[0065] Optical Character Recognition (OCR) refers to the process by which electronic devices (such as scanners or digital cameras) examine printed characters on paper, determine their shapes by detecting dark and light patterns, and then translate those shapes into computer text using character recognition methods. In other words, for printed characters, it uses optical methods to convert the text in a paper document into a black-and-white dot matrix image file, and then uses recognition software to convert the text in the image into text format for further editing by word processing software. How to correct errors or utilize auxiliary information to improve recognition accuracy is the most important issue in OCR, hence the term Intelligent Character Recognition (ICR). The main indicators for evaluating the performance of an OCR system include: rejection rate, false recognition rate, recognition speed, user interface friendliness, product stability, ease of use, and feasibility.

[0066] Binarization is one of the simplest methods for image segmentation. It converts a grayscale image into a binary image. Pixels with grayscale values ​​greater than a certain threshold are set as grayscale maxima, and pixels with grayscale values ​​less than this threshold are set as grayscale minima, thus achieving binarization. Depending on the threshold chosen, binarization algorithms are divided into fixed threshold and adaptive threshold methods. Commonly used binarization methods include the bimodal method, the P-parameter method, and iterative methods.

[0067] Clustering: The process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects is called clustering. A cluster generated by clustering is a set of data objects that are similar to objects within the same cluster and dissimilar to objects in other clusters. "Birds of a feather flock together," and in both natural and social sciences, there are numerous classification problems. Cluster analysis, also known as group analysis, is a statistical analysis method for studying the classification of (samples or indicators). Cluster analysis originated from taxonomy, but clustering is not the same as classification. The difference between clustering and classification is that the classes to be divided in clustering are unknown. Cluster analysis encompasses a wide range of methods, including hierarchical clustering, ordered sample clustering, dynamic clustering, fuzzy clustering, graph theory clustering, and cluster prediction methods.

[0068] Hierarchical tree: A complete family hierarchy list of all contours in an image.

[0069] Grayscale: This refers to dividing the brightness variation between the brightest and darkest areas into several levels. This facilitates the corresponding screen brightness control based on the signal input. Each digital image is composed of many points, also called pixels. Each pixel can typically display many different colors, and it is composed of three sub-pixels: red, green, and blue. Each sub-pixel can display different brightness levels depending on the light source behind it. Grayscale represents the different brightness levels from the darkest to the brightest. The more levels there are, the more delicate the image effect can be. For example, an 8-bit panel can display 2 to the power of 8, which equals 256 brightness levels, and we call this 256 grayscale. Each pixel on an LCD screen is composed of different brightness levels of red, green, and blue, ultimately forming different color points. In other words, the color change of each point on the screen is actually caused by the grayscale changes of the three RGB sub-pixels that make up that point.

[0070] Monkey testing: The word "monkey" refers to a mischievous and playful animal. Therefore, monkey testing, as the name suggests, involves randomly pressing keys on the software, much like a monkey that doesn't understand anything but loves to cause trouble. The principle behind monkey testing is similar: it stress-tests the program on the device by sending pseudo-random streams of user events (such as key presses, touchscreen inputs, trackball swipes, and gestures) to the system. It detects how long it takes for the program to malfunction, and performs monkey testing after functional testing is complete.

[0071] The control position recognition method provided in this application can be applied to the recognition of control positions in applications on various terminals, and can also be applied to the recognition of control positions in mini-programs.

[0072] The method, apparatus, electronic device, computer-readable storage medium, and computer program product for identifying the position of interactive information controls provided in this application are intended to solve the above-mentioned technical problems of the prior art.

[0073] The technical solutions of this application and their effects are described below through several exemplary embodiments. It should be noted that the following embodiments can be referenced, borrowed from, or combined with each other. Identical terms, similar features, and similar implementation steps in different embodiments will not be repeated.

[0074] Please refer to Figure 1 This diagram illustrates an implementation environment provided by an exemplary embodiment of this application. The implementation environment includes a first terminal 120 and a server 140.

[0075] The first terminal 120 installs and runs the application to be identified and the application for control position identification. The application to be identified can be a social application, a game application, a lifestyle application, or a mini-program, etc. The first user uses the first terminal to identify the control positions of the application to be identified.

[0076] The first terminal 120 is connected to the server 140 via a wireless network and a wired network.

[0077] Server 140 includes a single server, multiple servers, and a cloud computing platform. Schematic, server 140 includes a processor 144 and a memory 142, with the memory 142 including a display module 1421, a control module 1422, and a receiving module 1423. Server 140 provides background services for applications supporting control position recognition. Optionally, server 140 undertakes the primary computing task, and the first terminal 120 undertakes secondary computing tasks; or, server 140 undertakes secondary computing tasks, and the first terminal 120 undertakes primary computing tasks; or, server 140 and the first terminal 120 collaborate on computing using a distributed computing architecture.

[0078] The device type of the first terminal includes at least one of the following: smartphone, tablet computer, e-book reader, Moving Picture Experts Group Audio Layer III (MP3) player, Moving Picture Experts Group Audio Layer IV (MP4) player, laptop computer, and desktop computer. The following embodiments use a smartphone as an example.

[0079] Those skilled in the art will understand that the number of terminals described above can be more or less. For example, there may be only one terminal, or there may be dozens or hundreds of terminals, or even more. This application does not limit the number of terminals or the type of device.

[0080] Please refer to Figure 2 This illustration shows a schematic diagram of a system architecture for a control position recognition method provided in an exemplary embodiment of this application. The system architecture includes: an OCR recognition module 201, a threshold recognition module 202, and a clustering module 203.

[0081] The OCR recognition module 201 is used to obtain the character area and character content of each character on the interface, thereby obtaining the character recognition box corresponding to the character area.

[0082] The threshold recognition module 202 is used to perform image color space conversion on the screenshot to obtain a grayscale image. It merges the RGB values ​​and normalizes each pixel of the screenshot to the range of [0, 255] to obtain a grayscale image. The threshold recognition module 202 is also used to further simplify the grayscale image by setting the pixel value of each pixel in the grayscale image to 0 or 255 to filter out irrelevant image features and retain only useful key features to obtain a binarized image. The threshold recognition module 202 is also used to perform contour segmentation on the binarized image and filter the segmented contours to obtain contour recognition boxes.

[0083] Clustering module 203 is used to summarize character recognition boxes and contour recognition boxes, and to cluster them based on the distance between the recognition boxes, thereby merging recognition boxes that have overlapping positions or are too close to each other to obtain a recognition box, thus determining the position of the control.

[0084] In related technologies, the following methods are mainly used to identify the position of controls for different platforms and application types: For Android platforms, the node tree provided by uiautomator is used to identify clickable controls in the application interface; for iOS platforms, the control tree acquisition capability provided by webdriveragent is used to obtain the controls in the application page, and then the clickable controls are determined by the control attributes; for game applications, the game's rendering tree needs to be obtained through the interface provided by the game engine or game code logic to obtain the clickable controls in the page.

[0085] In other words, in related technologies, the methods using uiautomator and webdriveragent to identify clickable controls can only be applied to mobile software on the corresponding system platform. When identifying controls in game applications and web pages, only large blocks of view elements are returned, and the interface control information cannot be accurately obtained. For control identification methods specifically applied to games, it is necessary to obtain the game source code to obtain the game's rendering tree in order to identify controls. Therefore, the control identification methods in related technologies are limited by platform systems and software types, resulting in poor applicability.

[0086] To overcome the aforementioned problems of the prior art, this application provides a method for identifying the position of a control, which can be applied to, for example... Figure 1 The terminal shown is as follows: Figure 3 As shown, the method includes:

[0087] S101. Obtain a screenshot of the application's interface.

[0088] In this application embodiment, the application can be a standalone application or a mini-program. This application embodiment does not make specific limitations. The type of application can be a social application, a lifestyle service application, a game application, or a shopping application. This application embodiment does not make specific limitations.

[0089] In this embodiment, screenshots of the interface can be obtained by taking screenshots of the terminal or recording video streams. For example, by launching an application on the terminal and enabling the screen recording function, the tester can traverse each interface of the application through a visual interactive interface, thereby recording each interface of the application and obtaining screenshots of each interface of the application.

[0090] S102. Perform optical character OCR recognition on the screenshot to obtain the character area of ​​each character in the screenshot, and determine the character recognition box that corresponds one-to-one with each character area on the screenshot.

[0091] In this embodiment of the application, characters on the screenshot are identified by OCR recognition to obtain the OCR recognition result of the screenshot. The OCR recognition result can be the specific location of each character area in the screenshot and the specific content of the character.

[0092] In one example, refer to Figure 4 This example illustrates an illustration of OCR recognition performed on a screenshot of a game interface. As shown in the figure, the screenshot contains content such as "game pause," "save," "backpack," "skills," and "get the key to open the room," as well as game screen. OCR recognition is performed on the screenshot to obtain the text and pixel coordinates of each character. Based on the pixel coordinates, the corresponding character recognition box is determined for each character area. Since controls used to instruct users to click are usually accompanied by corresponding text, this embodiment uses OCR recognition to obtain the area where characters are located on the interface. This narrows down the range of control location recognition and improves the accuracy of control recognition. Furthermore, when controls are updated, their shape, color, and font are often modified. However, using OCR recognition to filter controls is not affected by changes in shape, color, and font. Therefore, it adapts well to applications with rapid updates and iterations, eliminating the need for frequent modifications to the recognition method. This ensures accuracy while improving the applicability of the control location recognition method.

[0093] It should be noted that the controls mentioned in the embodiments of this application are all interactive controls, that is, controls used for users to interact with the application. The embodiments of this application do not limit the specific interaction method, such as single click, double click, swipe, long press, etc.

[0094] In some embodiments, a 540x1170 image is input into the OCR recognition framework, and the OCR recognition framework returns the character regions and the content of each character in the image within one second.

[0095] This application provides a possible implementation method: traverse all character recognition boxes; for any two character recognition boxes, if there is an overlapping area between them, merge the two character recognition boxes into one character recognition box; for any character recognition box after traversal, if it is determined that the number of characters in the character recognition box exceeds the character count threshold, delete the character recognition box.

[0096] In this embodiment of the application, the overlapping area between two character recognition boxes can refer to the overlapping area in the horizontal direction, the overlapping area in the vertical direction, or the overlapping area in both the horizontal and vertical directions.

[0097] refer to Figure 5 The figure illustrates a schematic diagram of overlapping areas in the character recognition frames of this application. As shown, character recognition frames Z1 and Z2 have overlapping areas in the horizontal direction; character recognition frames Z3 and Z4 have overlapping areas in the vertical direction; and character recognition frames Z5 and Z6 have overlapping areas in both the horizontal and vertical directions.

[0098] Since application screenshots often contain large blocks of text describing the current display, and these large blocks of text are mostly displayed as multiple characters arranged regularly at equal intervals in the horizontal or vertical direction, "close arrangement" refers to the arrangement of characters at equal intervals in the horizontal or vertical direction. Therefore, when determining whether there are overlapping areas in the character recognition boxes identified by OCR, we can first determine whether the text recognition boxes are closely arranged. If they are closely arranged, we further determine whether there are overlapping areas among the character recognition boxes. If there are overlapping areas, we merge all the character recognition boxes in the current area into one character recognition box.

[0099] In this embodiment, it is determined whether the number of characters in each merged character recognition box exceeds a character count threshold. Generally, controls are usually concise when displaying indicative text, meaning that a control displays a small number of characters. In this embodiment, characters in character recognition boxes exceeding the character count threshold are considered not to be indicative text used to instruct the user to click the control. That is, the area corresponding to the current character recognition box is not the area where the control is located. Therefore, character recognition boxes with a character count exceeding the character count threshold are deleted, thereby filtering out non-control character recognition boxes. Among them, indicative text used to instruct the user to click the control, such as "Game Start", "Game End", "Game Pause", "Back", "Save", etc., are usually very short texts. Therefore, filtering character recognition boxes by the number of characters in them avoids using content displayed in the interface screenshot, such as game introductions, game instructions, and game plot, as indicative text for controls.

[0100] In one example, refer to Figure 6 The figure exemplifies the schematic diagram of filtering character recognition boxes according to this application. As shown in the screenshot, there are twenty character recognition boxes, numbered 1-20. Character recognition boxes 1-8 are arranged closely in the horizontal direction, and there is a partial overlap between adjacent character recognition boxes. Character recognition boxes 9-10 are arranged closely in the horizontal direction, and there is a partial overlap between them. Character recognition boxes 11-14 are arranged closely in both the horizontal and vertical directions, and there is a partial overlap between adjacent character recognition boxes. Character recognition boxes 15-20 are arranged closely in the vertical direction, and there is a partial overlap between adjacent character recognition boxes. Based on the existence of overlapping areas between character recognition boxes, the character recognition boxes are merged. Character recognition boxes 1-8 in the above screenshot are merged into one character recognition box A, character recognition boxes 9-10 in the screenshot are merged into one character recognition box B, character recognition boxes 11-14 are merged into one character recognition box C, and character recognition boxes 15-20 are merged into one character recognition box D. For each character recognition box, it is determined whether the number of characters in the character recognition box exceeds 5. All characters and character recognition boxes in character recognition boxes with more than 5 characters are deleted. Therefore, the characters and their corresponding character recognition boxes in character recognition boxes A and D are deleted, thus only character recognition boxes 9-14 are retained. Since concise text is usually used in the application interface to instruct users to click on corresponding controls, filtering by the number of characters can effectively eliminate text content that is not related to the control, greatly improving the accuracy of control position recognition.

[0101] It should be noted that this application does not limit the word count threshold. The above content is only to more clearly express the specific implementation process of this solution. The specific value of the word count threshold is selected according to the actual application scenario. For example, in a game application, the word count threshold is 4.

[0102] S103. Convert the screenshot into a binary image, perform contour segmentation on the binary image to obtain at least one contour region, and determine the contour recognition box corresponding to each contour region on the screenshot.

[0103] In this embodiment, contour recognition of a binary image can be performed by establishing a hierarchical tree. For example, all contour regions are detected using CV_TETR_TREE, and the contour structure is reconstructed, where the outer contour contains the inner contour. The outer contour containing the inner contour means that the entire region of the inner contour falls within the region of the outer contour.

[0104] This application provides a possible implementation method: for each contour region, extract the vertex information of the contour from the contour region, and determine the contour recognition box corresponding to the contour region based on the vertex information of the contour region.

[0105] In one example, after obtaining the contour regions of the binarized image, for each contour region, vertex information is extracted and saved. Based on this vertex information, the corresponding contour bounding box is determined. Here, a vertex refers to a point where lines intersect, lines intersect with curves, or curves intersect; vertex information refers to the coordinates of the vertices on the screenshot. For example, using the CV_CHAIN_APPROX_SIMPLE method, which only saves the vertex information of the contours, the contour bounding box is obtained based on the vertex information of all contours. Saving only the vertex information, without retaining information about the line or curve segments between vertices, significantly reduces the computational cost during image processing.

[0106] This application provides a possible implementation method: converting a screenshot of the interface into a grayscale image; setting the pixel values ​​in the grayscale image that are greater than a pixel threshold as a first value; setting the pixel values ​​in the grayscale image that are less than the pixel threshold as a second value, thus obtaining a binarized image; wherein, the pixel threshold is determined based on the pixel values ​​of each character. In this application embodiment, the first value can be 255, and the second value can be 0.

[0107] In some embodiments, the COLOR_BAYER_GR2GRAY capability of cv.cvtColor in OpenCV is used to convert a color image with RGB 3 channels into a grayscale image. The specific process is as follows: First, the RGB values ​​are merged, and then each pixel in the image is normalized to the range [0, 255]. The formula for converting an RGB color image into a grayscale image is as follows:

[0108] Gray=0.2989×R+0.5870×G+0.1140×B

[0109] Where R, G, and B represent the values ​​of the red, green, and blue channels of the color image, respectively. After grayscale analysis, the image will be a [0, 255] image array.

[0110] In some embodiments, pixels with values ​​higher than a pixel threshold are set to 255, and pixels with values ​​lower than the pixel threshold are set to 0, thus completing image binarization. Furthermore, extensive sample experiments have shown that a pixel threshold of 127 covers most controls. Therefore, for each pixel in the grayscale image, if the pixel value is greater than 127, the pixel value is set to 255; if the pixel value is less than 127, the pixel value is set to 0. Pixels with a value of 255, i.e., white areas, are more easily identified as controls in subsequent steps. This grayscale image binarization method further simplifies the image, retaining only key features useful for control recognition and filtering out irrelevant image features, significantly improving the accuracy of control recognition.

[0111] This application provides a possible implementation method for obtaining a pixel threshold, which includes: determining the pixel value of each character; determining the average pixel value of the character based on the pixel value of each character; and using the average pixel value of the character as the pixel threshold.

[0112] In one example, the pixel values ​​of each character within each character recognition box are obtained. Based on the pixel values ​​of all characters, the average pixel value of the characters on the current interface is calculated. The grayscale image is then binarized based on this average pixel value. Pixels with values ​​greater than the average pixel value are set to 255, and pixels with values ​​less than the average pixel value are set to 0. This completes the binarization of the grayscale image. By binarizing the grayscale image, only the key features useful for control recognition are retained, while irrelevant image features are filtered out, significantly improving the accuracy of control recognition.

[0113] In another example, the pixel threshold can also be obtained by: acquiring the pixel values ​​of all characters within the character recognition box and the background within the character recognition box; and then calculating the weighted average of the character and background pixel values ​​to obtain the pixel threshold. By using the average pixel value of each character within the character recognition box as the pixel threshold for binarization, key features of the grayscale image are filtered. This better reflects the actual situation of each control in the current interface screenshot, greatly improving the accuracy of key feature filtering, effectively removing irrelevant features, and thus improving the accuracy of control position recognition.

[0114] S104. Cluster each character recognition box and contour recognition box based on the distance between the recognition boxes. For a cluster that includes multiple recognition boxes, merge all the recognition boxes in the cluster into a single target recognition box. Use the area corresponding to the target recognition box in each cluster as the location of each control in the interface.

[0115] In this embodiment of the application, character recognition boxes and contour recognition boxes are collectively referred to as recognition boxes. The distance between recognition boxes can be calculated based on the center distance between recognition boxes or the nearest distance between the borders of recognition boxes.

[0116] Please refer to Figure 7 The example shows a schematic diagram of the distance between two recognition boxes in this application. As shown in the figure, there are recognition boxes K1 and K2 in the screenshot. Recognition boxes K1 and K2 have overlapping areas. The distance d1 between the centers A1 and A2 of the two recognition boxes is calculated. If the distance d1 between the centers is less than the distance threshold, the two recognition boxes are merged into one recognition box. The area occupied by the two recognition boxes is the area contained in the merged recognition box.

[0117] The screenshot also shows recognition boxes K3 and K4. There is no overlapping area between recognition boxes K3 and K4. The nearest distance d2 between the borders of the two recognition boxes is calculated. If the nearest distance d2 between the borders of the two recognition boxes is less than the distance threshold, the two recognition boxes are merged into one recognition box. The area contained in the merged recognition box includes the area where the two recognition boxes are located, as well as the area between the recognition boxes.

[0118] In the clustering process of this application embodiment, recognition boxes that are more than a distance threshold apart are merged into a single target recognition box. That is, two character recognition boxes that are more than a distance threshold apart are merged, character recognition boxes and contour recognition boxes that are more than a distance threshold apart are merged, and two contour recognition boxes that are more than a distance threshold apart are merged.

[0119] In this embodiment, a density-based clustering algorithm (DBSCAN) is used to cluster the bounding boxes. Bounding boxes with a distance less than a distance threshold are clustered into a single target bounding box. For example, in a WeChat mini-program game interface, the formula for setting the distance threshold is as follows:

[0120]

[0121] Where width is the width of the game interface and height is the height of the game interface.

[0122] In one example, reference Figure 8a and Figure 8b The example illustrates the positions of the character recognition box and contour recognition box of this application before and after clustering, as shown in the diagram. Figure 8a As shown, the interface contains character recognition boxes Z1-Z8 and contour recognition boxes L1-L8. Character recognition boxes Z1 and Z2 are completely covered by contour recognition box L1. Character recognition boxes Z3 and Z4 partially overlap with contour recognition box L2. Character recognition boxes Z5 and Z6 do not overlap with other recognition boxes and are relatively close (distance greater than a distance threshold). Character recognition boxes Z7 and Z8 do not overlap with any recognition boxes and are relatively close (distance less than a distance threshold). Contour recognition boxes L3 and L4 partially overlap. Contour recognition boxes L5 and L6 do not overlap with other recognition boxes and are relatively close (distance greater than a distance threshold). Contour recognition boxes L7 and L8 do not overlap with other recognition boxes and are relatively close (distance less than a distance threshold). Clustering character recognition boxes Z1-Z8 and contour recognition boxes L1-L8 based on the distance threshold yields the following results: Figure 8b The clustering results shown indicate that the character recognition boxes Z1, Z2, and contour recognition box L1 are merged into a single target recognition box A; character recognition boxes Z3, Z4, and contour recognition box L2 are merged into a single target recognition box B; character recognition boxes Z7 and Z8 are merged into a single target recognition box C; contour recognition boxes L3 and L4 are merged into a single target recognition box D; and contour recognition boxes L7 and L8 are merged into a single target recognition box E. From these clustering results, it can be seen that the current interface contains target recognition boxes A, B, C, D, and E, as well as character recognition boxes Z5 and Z6 and contour recognition boxes L5 and L6. These recognition boxes are considered controls on the interface, and the area where each recognition box is located represents the position of that control on the interface.

[0123] This application provides a possible implementation method in which, before clustering each character recognition box and contour recognition box based on the distance between the recognition boxes, for each contour recognition box, if it is determined that the size of the contour recognition box does not meet the preset size limit condition, the contour recognition box is deleted.

[0124] In some embodiments, after obtaining the corresponding contour recognition box based on the contour region, since the types of the recognized contours are very complex, if the area where the recognized contour recognition box is located is directly used as the area where the control is located in the interface, there will be a problem of poor control position recognition accuracy. Therefore, before recognizing the contour recognition box as a control, the contour recognition box needs to be filtered in sequence. The specific process is as follows: determine whether each contour recognition box meets the preset size limit conditions. If the contour recognition box does not meet the size limit conditions, delete the corresponding contour recognition box.

[0125] This application provides a possible implementation method, and the size limitation conditions include at least one of the following: the area of ​​the contour recognition box is between a first area threshold and a second area threshold; the ratio of the width to the height of the contour recognition box is less than a proportional threshold.

[0126] In the embodiments of this application, the first area threshold, the second area threshold, and the ratio of width to height can be determined by calculating a large number of interface control samples, or they can be determined based on the area ratio of the control in the interface.

[0127] In one embodiment, for terminal applications, the area of ​​controls is generally not allowed to occupy most of the application interface. Therefore, contour recognition boxes with excessively large outline areas need to be deleted. For example, in the WeChat mini-game page, the area of ​​most controls usually does not exceed 14% of the total area. In order to facilitate user clicks on the controls, the area of ​​the controls will not be less than 0.03% of the total area. Therefore, contour recognition boxes with an area ratio between 0.03% and 14% will be retained, and contour recognition boxes that do not meet the area ratio condition will be deleted and will not participate in the clustering steps of contour recognition boxes and character recognition boxes.

[0128] In another embodiment, for the application, to facilitate clickability, the area of ​​the control is usually not set to be too narrow, i.e., the ratio of the width to the height or the ratio of the height to the width of the control exceeds a certain threshold, where width refers to the horizontal length of the control and height refers to the vertical length of the control. For example, in the interface of WeChat mini-games, the ratio of the width to the height or the ratio of the height to the width of the control usually does not exceed 5. Therefore, for contour recognition boxes with a width-to-height ratio or a height-to-width ratio greater than 5, they are removed from the contour recognition box and do not participate in the clustering step of contour recognition boxes and character recognition boxes.

[0129] In one embodiment, the outline recognition box is retained if both the area and proportion conditions are met; that is, the outline recognition box will be deleted if it does not meet either of the above conditions.

[0130] In one example, refer to Figure 9a and Figure 9b The example illustrates a schematic diagram before and after the contour recognition box is filtered according to size limitations, as shown below. Figure 9a As shown, the interface contains contour recognition boxes L1-L5, with corresponding area proportions of 0.02%, 2%, 5%, 15%, and 16%, respectively, and corresponding width-to-height ratios of 1, 2, 6, 2, and 1, respectively. The size constraints for the contour recognition boxes are that their area proportions must be between 0.03% and 14% of the total area, and their aspect ratios (height-to-width or width-to-height) must be less than 5. Existing contour recognition boxes a, b, and c have corresponding area proportions of 1%, 5%, and 15%, and corresponding aspect ratios of 6, 2, and 1, respectively. Based on the size data of the contour recognition boxes and the set size constraints, contour recognition boxes L1, L3, L4, and L5 do not meet the size constraints. Therefore, contour recognition boxes L1, L3, L4, and L5 are deleted from the contour recognition box list. The result after filtering the contour recognition boxes is as follows. Figure 9b As shown, only the contour recognition box L2 is retained for participation in the subsequent clustering steps of contour recognition boxes and character recognition boxes.

[0131] By filtering the outline recognition boxes according to size constraints, non-control outline recognition boxes are removed as much as possible, thus avoiding the influence of non-control outline recognition boxes on the control recognition results and greatly improving the recognition accuracy during the control recognition process.

[0132] It should be noted that the embodiments of this application do not limit the first area threshold, the second area threshold, and the ratio threshold in the size limitation conditions. The values ​​used above are only for more clearly describing the specific process of filtering contour recognition boxes based on size limitation conditions in the embodiments of this application. The specific values ​​of the first area threshold, the second area threshold, and the ratio threshold are determined according to the specific application of the embodiments of this application.

[0133] In this embodiment of the application, before clustering each character recognition box and contour recognition box based on the distance between the recognition boxes, the contour recognition boxes are sorted in order of increasing area; each contour recognition box is judged in turn according to the sorting to see if it is completely covered by the subsequent contour recognition box, and the contour recognition box that is completely covered is recorded as the covered contour recognition box; after judging all contour recognition boxes, the covered contour recognition boxes are deleted.

[0134] In this embodiment, for each contour recognition box, based on the vertex information of each contour recognition box, it is determined whether the current contour recognition box is completely covered by other contour recognition boxes. All completely covered contour recognition boxes are marked. After judging all contour recognition boxes, the covered contour recognition boxes are deleted. Since in the application interface, the covered contour recognition box and the contour recognition box that covers it usually belong to the same control, the covered contour recognition box is deleted here. This ensures that the control is effectively recognized while avoiding the consumption of unnecessary computing and storage resources.

[0135] In one example, refer to Figure 10 The figure illustrates the position of the contour recognition box in the interface. As shown, the interface includes contour recognition boxes L1, L2, L3, L4, and L5. Contour recognition boxes L1 and L2 partially overlap, and contour recognition box L3 is completely covered by contour recognition box L4, i.e., contour recognition box L3 is contained by contour recognition box L4. Contour recognition box L5 completely covers contour recognition box L4. The five contour recognition boxes are sorted in ascending order of area, and the sorting result is contour recognition box L3, contour recognition box L1, contour recognition box L4, contour recognition box L2, and contour recognition box L5. Following the order of area from smallest to largest, the contour recognition boxes are sequentially judged for coverage. Based on the vertex information of contour recognition box L3 and other contour recognition boxes, it is found that contour recognition box L3 is completely covered by contour recognition box L4. Contour recognition box L3 is marked as a covered contour recognition box. The judgment is then performed sequentially based on the vertex information of each contour recognition box. It is found that contour recognition box L4 and contour recognition box L3 are marked as covered contour recognition boxes. Covered contour recognition boxes are removed from the contour recognition box and do not participate in the subsequent contour recognition box and character recognition box clustering process.

[0136] It should be noted that the embodiments of this application do not limit the execution steps of S102 and S103 above. The execution order can be determined according to the actual application scenario. The above steps S102 and S103 can be executed simultaneously; or step S102 can be executed first and then step S103 can be executed, or step S103 can be executed first and then step S102 can be executed.

[0137] In this embodiment, since the controls in the game are more diverse, the accuracy of the control position identification method provided in this embodiment is higher when applied to game applications and game mini-programs. It does not require injecting test code into the game to obtain control information using the interface provided by the game engine, nor does it require obtaining the game's source code. It can simultaneously meet the needs of black-box testing, and achieves efficient and effective testing of game programs with diverse controls without investing additional personnel costs for scene annotation or requiring game developers to provide source code for customized adaptation during program testing.

[0138] As an optional embodiment, the location of each control in the application interface is input into an automated testing program for the automated testing program to test the application.

[0139] Please see Figure 11 The figure exemplifies the interaction diagram applied to monkey testing in the embodiments of this application. As shown, after the terminal obtains all the interface screenshots of the application to be tested by taking screenshots, it transmits the interface screenshots of the application to be tested to the server. For each interface screenshot, the server performs OCR recognition to obtain the character region of each character in the interface screenshot, determines the character recognition box corresponding to each character region on the interface screenshot, merges the character recognition boxes with overlapping positions to obtain a new character recognition box, and filters the character recognition boxes according to the number of characters in the character recognition box, deleting all characters and character recognition boxes in the character recognition box whose number of characters exceeds the character number threshold; at the same time, threshold recognition is performed on each interface screenshot, that is, grayscale processing and binarization processing are performed on the interface screenshot to convert the interface screenshot into a binarized image. For example, the binarized image is segmented to obtain the contour regions on the screenshot, and the corresponding contour recognition boxes are obtained. Based on the size constraints and positional relationships between the contour recognition boxes, the boxes are filtered, removing those that do not meet the size constraints or are completely covered. After obtaining the filtered contour and character recognition boxes from the screenshot, the boxes are clustered based on their distances, merging boxes that are too close into a single target box. The area corresponding to the target box is used as the location of controls in the interface. The position information of each control on each interface of the application under test is transmitted to the Monkey testing system, which then tests the functionality of each interface of the application under test based on the obtained control position information.

[0140] The control position identification method provided in this application embodiment can be applied to monkey testing. The acquired control position information is transmitted to the monkey testing program, allowing the program to effectively traverse each interface of the application based on the control position information, thereby achieving application testing. Since the exploration effect and efficiency of monkey testing are directly related to the accuracy of control identification, and the control position identification method provided in this application embodiment greatly improves the accuracy of control position identification, applying this method to monkey testing significantly improves the testing efficiency and accuracy of monkey testing.

[0141] The control position recognition method provided in this application is not limited by system platform or application type and can play a role in various application scenarios. It obtains control positions through OCR recognition and threshold recognition, eliminating the need for manual data annotation and calculation on the controls in the interface screenshot. This greatly reduces the manual cost and computing power consumption of control position recognition. The control position recognition method provided in this application can also be well used in project teams with extremely tight staffing. In addition, the control position recognition method provided in this application does not require the use of a fixed recognition template. For applications that frequently update controls, there is no need to frequently update the recognition template. It can also be well applied to project teams with fast control iteration speed and tight staffing.

[0142] This application provides a device for recognizing the position of a control, such as... Figure 12 As shown, the device for recognizing the position of the control may include: an acquisition module 1201, a character recognition module 1202, a contour recognition module 1203, and a merging module 1204, wherein,

[0143] Module 1201 is used to obtain screenshots of the application's interface;

[0144] The character recognition module 1202 is used to perform optical character OCR recognition on the screenshot of the interface, obtain the character area of ​​each character in the screenshot of the interface, and determine the character recognition box corresponding to each character area on the screenshot of the interface.

[0145] The contour recognition module 1203 is used to convert the screenshot of the interface into a binary image, divide the binary image into contours to obtain at least one contour region, and determine a contour recognition box corresponding to each contour region on the screenshot of the interface.

[0146] The merging module 1204 is used to cluster the character recognition boxes and contour recognition boxes based on the distance between the recognition boxes. For a cluster containing multiple recognition boxes, all recognition boxes in the cluster are merged into a single target recognition box. The area corresponding to the target recognition box in each cluster is used as the location of each control in the interface.

[0147] The control position recognition device of this application embodiment can execute the control position recognition method provided in this application embodiment. The implementation principle is similar. The actions performed by each module in the control position recognition device of each embodiment of this application correspond to the steps in the method of each embodiment of this application. For detailed functional descriptions of each module of the device, please refer to the descriptions in the corresponding methods shown above, which will not be repeated here.

[0148] As an optional implementation, the terminal further includes: a first contour recognition frame filtering module;

[0149] The first contour recognition box filtering module is used to delete each contour recognition box if it is determined that the size of the contour recognition box does not meet the preset size limit conditions.

[0150] As an optional implementation, the contour recognition module includes: a vertex extraction unit, used to extract vertex information of the contour from the contour region for each contour region, and determine a contour recognition box corresponding to the contour region based on the vertex information of the contour region.

[0151] As an optional implementation, the size limiting conditions include at least one of the following:

[0152] The area of ​​the contour recognition box is between the first area threshold and the second area threshold;

[0153] The ratio of the width to the height of the contour recognition box is less than the proportional threshold.

[0154] As an optional implementation, the terminal also includes: a second contour recognition frame filtering module;

[0155] The second contour recognition box filtering module is used to sort the contour recognition boxes in ascending order of area.

[0156] Determine in order whether each contour recognition box is completely covered by the subsequent contour recognition box, and mark the completely covered contour recognition box as the covered contour recognition box.

[0157] After identifying all bounding boxes, delete any bounding boxes that were covered.

[0158] As an optional implementation, the contour recognition box module includes:

[0159] A conversion unit is used to convert the screenshot of the interface into a grayscale image;

[0160] The first setting unit is used to set the pixel values ​​in the grayscale image that are greater than a pixel threshold as a first value.

[0161] The second setting unit is used to set the pixel values ​​in the grayscale image that are less than a pixel threshold to a second value.

[0162] The acquisition unit is used to obtain the binarized image;

[0163] The pixel threshold is determined based on the pixel value of each character.

[0164] As an optional implementation method, the method for determining the pixel threshold includes:

[0165] Determine the pixel value of each character;

[0166] Determine the average pixel value of each character based on its pixel value;

[0167] The average pixel value of the character is used as the pixel threshold.

[0168] As an optional implementation, the terminal also includes: a character recognition box filtering module;

[0169] The character recognition box filtering module is used to traverse all character recognition boxes. For any two character recognition boxes, if the two character recognition boxes have overlapping areas, the two character recognition boxes are merged into one character recognition box.

[0170] For any character recognition box after traversal, if it is determined that the number of characters in the character recognition box exceeds the character count threshold, then the character recognition box is deleted.

[0171] As an optional implementation, the terminal also includes: an input module;

[0172] The input module is used to input the location of each control in the application interface into the automated test program, so that the automated test program can test the application.

[0173] This application provides an electronic device, including a memory, a processor, and a computer program stored in the memory. The processor executes the computer program to implement the steps of a control position recognition method. Compared with related technologies, the control position recognition method of this application can achieve the following: By using character recognition boxes and contour recognition boxes selected by optical character recognition and threshold recognition methods respectively, the controls in the page screenshot are filtered from multiple dimensions. Furthermore, the character recognition boxes and contour recognition boxes are clustered based on the distance between the recognition boxes, so that recognition boxes that are too close or overlap can be recognized as one recognition box, thereby further filtering the position of the controls and obtaining the position of each control on the interface. By using optical character recognition and threshold recognition, the accuracy of control position recognition is greatly improved, and there is no need for manual annotation of a large amount of data on the controls in the interface screenshot. While ensuring the accuracy of control position recognition, the manual cost and computing power consumption required for control recognition are greatly reduced. At the same time, the control identification method of this application does not require the use of a fixed recognition template. For applications that frequently update controls, there is no problem of needing to frequently update the recognition template, and it is not limited by application type or software operating platform, greatly improving the applicability of the control position recognition method.

[0174] In one alternative embodiment, an electronic device is provided, such as Figure 13 As shown, Figure 13 The illustrated electronic device 4000 includes a processor 4001 and a memory 4003. The processor 4001 and the memory 4003 are connected, for example, via a bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, which can be used for data interaction between the electronic device and other electronic devices, such as sending and / or receiving data. It should be noted that in practical applications, the transceiver 4004 is not limited to one type, and the structure of the electronic device 4000 does not constitute a limitation on the embodiments of this application.

[0175] Processor 4001 may be a CPU (Central Processing Unit), a general-purpose processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. Processor 4001 may also be a combination that implements computational functions, such as including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

[0176] Bus 4002 may include a pathway for transmitting information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus, etc. Bus 4002 can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in Figure 8, but this does not indicate that there is only one bus or one type of bus.

[0177] The memory 4003 may be ROM (Read Only Memory) or other types of static storage devices capable of storing static information and instructions, RAM (Random Access Memory) or other types of dynamic storage devices capable of storing information and instructions, or EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium capable of carrying or storing computer programs and capable of being read by a computer, without limitation herein.

[0178] The memory 4003 is used to store computer programs that execute the embodiments of this application, and the execution is controlled by the processor 4001. The processor 4001 is used to execute the computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.

[0179] The electronic device package may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital radio receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and in-vehicle terminals (such as in-vehicle navigation terminals), as well as fixed terminals such as digital TVs and desktop computers. Figure 13 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments disclosed herein.

[0180] This application provides a computer-readable storage medium storing a computer program. When executed by a processor, the computer program can implement the steps and corresponding content of the aforementioned method embodiments. Compared with the prior art, the control position recognition method of this application can: filter the controls in the page screenshot from multiple dimensions by using character recognition boxes and contour recognition boxes selected by optical character recognition and threshold recognition methods respectively; and cluster the character recognition boxes and contour recognition boxes based on the distance between the recognition boxes, so that recognition boxes that are too close or overlap can be identified as one recognition box, thereby further filtering the position of the controls and obtaining the position of each control on the interface. By using optical character recognition and threshold recognition, the accuracy of control position recognition is greatly improved, and there is no need for manual annotation of a large amount of data on the controls in the interface screenshot. While ensuring the accuracy of control position recognition, the manual cost and computing power consumption required for control recognition are greatly reduced. At the same time, the control identification method of this application does not require the use of a fixed recognition template. For applications that frequently update controls, there is no need to frequently update the recognition template, and it is not limited by application type and software operating platform, greatly improving the applicability of the control position recognition method.

[0181] It should be noted that the computer-readable medium described in this disclosure can be a computer-readable signal medium, a computer-readable medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0182] This application also provides a computer program product, including a computer program that, when executed by a processor, can implement the steps and corresponding content of the aforementioned method embodiments. Compared with the prior art, the control position recognition method of this application can achieve the following: By using character recognition boxes and contour recognition boxes selected by optical character recognition and threshold recognition methods respectively, the controls in the page screenshot are filtered from multiple dimensions. Furthermore, based on the distance between the recognition boxes, the character recognition boxes and contour recognition boxes are clustered, so that recognition boxes that are too close or overlap can be identified as one recognition box, thereby further filtering the control positions and obtaining the position of each control on the interface. By using optical character recognition and threshold recognition, the accuracy of control position recognition is greatly improved, and there is no need for manual annotation of a large amount of data on the controls in the interface screenshot. While ensuring the accuracy of control position recognition, the manual cost and computing power consumption required for control recognition are greatly reduced. At the same time, the control identification method of this application does not require the use of a fixed recognition template. For applications that frequently update controls, there is no need to frequently update the recognition template, and it is not limited by application type or software operating platform, greatly improving the applicability of the control position recognition method.

[0183] The terms "first," "second," "third," "fourth," "1," "2," etc. (if present) in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in a sequence other than that shown in the illustrations or text descriptions.

[0184] It should be understood that although arrows indicate various operation steps in the flowcharts of this application's embodiments, the order in which these steps are implemented is not limited to the order indicated by the arrows. Unless explicitly stated herein, in some implementation scenarios of this application's embodiments, the implementation steps in each flowchart can be executed in other orders as required. Furthermore, some or all steps in each flowchart, based on the actual implementation scenario, may include multiple sub-steps or multiple stages. Some or all of these sub-steps or stages can be executed at the same time, and each sub-step or stage can also be executed at different times. In scenarios where execution times differ, the execution order of these sub-steps or stages can be flexibly configured according to requirements, and this application's embodiments do not limit this.

[0185] The above description is only an optional implementation method for some implementation scenarios of this application. It should be noted that for those skilled in the art, other similar implementation methods based on the technical concept of this application without departing from the technical concept of this application also fall within the protection scope of the embodiments of this application.

Claims

1. A method for identifying the position of a control, characterized in that, include: Obtain screenshots of the application's interface; Optical character OCR recognition is performed on the screenshot of the interface to obtain the character region of each character in the screenshot of the interface, and a character recognition box corresponding to each character region is determined on the screenshot of the interface. The screenshot is converted into a binary image, the binary image is segmented into contours to obtain at least one contour region, and a contour recognition box corresponding to each contour region is determined on the screenshot. Each character recognition box and contour recognition box is clustered based on the distance between the recognition boxes. For a cluster containing multiple recognition boxes, all recognition boxes in the cluster are merged into a single target recognition box. The area corresponding to the target recognition box in each cluster is used as the location of each control in the interface.

2. The method for identifying the position of a control according to claim 1, characterized in that, The clustering of character and contour recognition boxes based on the distance between the recognition boxes, prior to which the following is also included: For each contour recognition box, if it is determined that the size of the contour recognition box does not meet the preset size limit conditions, then the contour recognition box is deleted.

3. The method for identifying the position of a control according to claim 1, characterized in that, The step of determining the contour recognition box corresponding one-to-one with each contour region on the screenshot includes: For each contour region, the vertex information of the contour is extracted from the contour region, and a contour recognition box corresponding to the contour region is determined based on the vertex information of the contour region.

4. The method for identifying the position of a control according to claim 2, characterized in that, The size limitation conditions include at least one of the following: The area of ​​the contour recognition box is between the first area threshold and the second area threshold; The ratio of the width to the height of the contour recognition box is less than the proportional threshold.

5. The method for identifying the position of a control according to any one of claims 1-4, characterized in that, The clustering of character and contour recognition boxes based on the distance between the recognition boxes, prior to which the following is also included: Sort the contour recognition boxes in ascending order of area; Determine in order whether each contour recognition box is completely covered by the subsequent contour recognition box, and mark the completely covered contour recognition box as the covered contour recognition box. After identifying all bounding boxes, delete any bounding boxes that were covered.

6. The method for identifying the position of a control according to claim 1, characterized in that, The step of converting the screenshot into a binary image includes: Convert the screenshot of the interface into a grayscale image; The pixel values ​​in the grayscale image that are greater than a pixel threshold are set to a first value, and the pixel values ​​in the grayscale image that are less than a pixel threshold are set to a second value, thus obtaining the binarized image; The pixel threshold is determined based on the pixel value of each character.

7. The method for identifying the position of a control according to claim 6, characterized in that, The method for determining the pixel threshold includes: Determine the pixel value of each character; Determine the average pixel value based on the pixel value of each character; The average pixel value is used as the pixel threshold.

8. The method for identifying the position of a control according to claim 1, characterized in that, The clustering of character and contour recognition boxes based on the distance between the recognition boxes, prior to which the following is also included: Traverse all character recognition boxes. For any two character recognition boxes, if the two character recognition boxes have overlapping areas, then merge the two character recognition boxes into one character recognition box. For any character recognition box after traversal, if it is determined that the number of characters in the character recognition box exceeds the character count threshold, then the character recognition box is deleted.

9. The method for identifying the position of a control according to claim 1, characterized in that, The method further includes: The locations of each control in the application interface are input into the automated testing program, which then tests the application.

10. A device for recognizing the position of a control, characterized in that, include: The acquisition module is used to obtain screenshots of the application's interface. The character recognition module is used to perform optical character OCR recognition on the screenshot of the interface, obtain the character area of ​​each character in the screenshot of the interface, and determine a character recognition box on the screenshot that corresponds one-to-one with each character area. The contour recognition module is used to convert the screenshot into a binary image, perform contour division on the binary image to obtain at least one contour region, and determine a contour recognition box corresponding to each contour region on the screenshot. The merging module is used to cluster the character recognition boxes and contour recognition boxes based on the distance between the recognition boxes. For a cluster containing multiple recognition boxes, all recognition boxes in the cluster are merged into a single target recognition box. The area corresponding to the target recognition box in each cluster is used as the location of each control in the interface.

11. An electronic device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the control position recognition method according to any one of claims 1-9.

12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the control position recognition method according to any one of claims 1-9.

13. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the control position recognition method according to any one of claims 1-9.