Image target auxiliary labeling method, device and equipment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring user click points and generating virtual click points, and combining them with a semantic segmentation model, the problem of insufficient manual click volume was solved, thereby improving the accuracy of image target annotation and contour fitting.

CN115272279BActive Publication Date: 2026-06-26BEIJING SHENDU SOUSUO TECH CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING SHENDU SOUSUO TECH CO LTD
Filing Date: 2022-08-16
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, insufficient manual clicks lead to inaccurate image target annotation data, affecting the annotation performance of AI models.

Method used

By acquiring user click points, an initial region is marked, and a preset number of virtual click points are generated through uniform sampling. Combining the positional relationship between user click points and virtual click points, a semantic segmentation model is used to mark the final region.

Benefits of technology

It improves the accuracy of image target annotation and the fit between the contour and the real contour, thus enhancing the quality of the annotation data.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115272279B_ABST

Patent Text Reader

Abstract

An image target auxiliary labeling method, device and equipment are provided in the disclosure, and the method comprises: acquiring a to-be-labeled image and a user click point; labeling an initial region of a to-be-recognized target according to the user click point; sampling the to-be-labeled image to obtain a preset number of virtual click points; and labeling a final region of the to-be-recognized target in the to-be-labeled image according to the position relationship between the user click point and each virtual click point and the initial region. The disclosure realizes automatic expansion of the click point by automatically generating a preset number of virtual click points, so that the region where the to-be-recognized target is located in the image can be accurately labeled based on the expanded click point.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer vision technology, and in particular to an auxiliary annotation method, apparatus and device for image targets. Background Technology

[0002] When performing image target recognition and instance segmentation tasks, a large amount of labeled data is often required to provide the outline of the target to be recognized in the image or the area of the image it occupies (hereinafter referred to as mask).

[0003] In instance segmentation tasks, when generating labeled data, an AI model is typically used first to predict the mask of the target to be identified in the image. Then, the mask is adjusted based on multiple rounds of auxiliary annotation operations performed by the annotator on the original image to obtain the final labeled data. These auxiliary annotation operations can be one of the following: contour adjustment (adjusting the contour by dragging individual contour key points), superpixel adjustment (selecting the target region at the superpixel level), or positive / negative clicking (selecting / excluding certain regions of the target by clicking).

[0004] Compared to contour adjustment and superpixel adjustment, positive and negative clicks are the most "arbitrary." Annotators can perform positive or negative clicks at any location where regions need to be added or deleted to guide the AI model in adjusting the mask. However, when the number of manual clicks is relatively small, it is insufficient to guide the AI model to annotate the entire target mask, thus affecting the accuracy of the annotation data. Summary of the Invention

[0005] In view of this, this disclosure proposes an auxiliary annotation method, apparatus and device for image targets, which can accurately annotate the area where the target to be identified is located in the image based on automatically expanded click points.

[0006] According to a first aspect of this disclosure, an auxiliary annotation method for image targets is provided, comprising:

[0007] Obtain the image to be labeled and the user's click points;

[0008] Based on the user's click points, mark the initial area of the target to be identified;

[0009] The image to be labeled is sampled to obtain a preset number of virtual click points;

[0010] Based on the positional relationship between the user click points and each of the virtual click points and the initial region, the final region of the target to be identified in the image to be labeled is marked.

[0011] In one possible implementation, when sampling the image to be labeled to obtain a preset number of virtual click points, uniform sampling is used.

[0012] In one possible implementation, when uniformly sampling the image to be labeled to obtain virtual click points, the process includes:

[0013] The sampling region is determined in the image to be labeled, and the width and height of the sampling region are obtained;

[0014] Within the width range, a first integer is randomly generated through uniform sampling as the width value of the virtual click point;

[0015] Within the height range, a second integer is randomly generated through uniform sampling as the height value of the virtual click point;

[0016] The virtual click point is obtained based on the width and height values.

[0017] In one possible implementation, the sampling region is the area in the image to be labeled where the target to be identified is located, which is pre-selected.

[0018] In one possible implementation, after obtaining the virtual click point, the method further includes:

[0019] Determine whether the virtual click point is the same as the previously generated virtual click point;

[0020] When it is determined that the virtual click point is different from the already generated virtual click point, the virtual click point is stored.

[0021] In one possible implementation, the positional relationship between the virtual click point and the initial region is characterized by labels on the virtual click points. These labels include positive and negative labels; a positive label indicates that the virtual click point is located within the initial region, and a negative label indicates that the virtual click point is outside the initial region. In another possible implementation, the number of virtual click points with positive labels is balanced with the number of virtual click points with negative labels among the preset number of virtual click points.

[0022] In one possible implementation, both the user click point and the initial region are obtained based on a compressed image of the image to be labeled.

[0023] According to a second aspect of this disclosure, an auxiliary annotation apparatus for image targets is provided, comprising:

[0024] The image acquisition module is used to acquire the image to be labeled and the user's click points;

[0025] The first annotation module is used to annotate the initial region of the target to be identified in the image to be annotated based on the user's click point;

[0026] The virtual click point generation module is used to sample the image to be labeled to obtain a preset number of virtual click points;

[0027] The second annotation module is used to annotate the final region of the target to be identified in the image to be annotated based on the positional relationship between the user click point and each of the virtual click points and the initial region.

[0028] According to a third aspect of this disclosure, an auxiliary annotation device for image targets is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the method described in the first aspect of this disclosure when executing the executable instructions.

[0029] This disclosure provides an auxiliary labeling method for image targets, including: acquiring an image to be labeled and user click points; labeling an initial region of the target to be identified based on the user click points; sampling the image to be labeled to obtain a preset number of virtual click points; and labeling the final region of the target to be identified in the image to be labeled based on the positional relationship between the user click points, each virtual click point, and the initial region. In this disclosure, the automatic expansion of click points is achieved by automatically generating a preset number of virtual click points, thereby enabling accurate labeling of the region where the target to be identified is located in the image based on the expanded click points.

[0030] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0031] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this disclosure together with the specification and serve to explain the principles of this disclosure.

[0032] Figure 1 A flowchart illustrating an auxiliary annotation method for image targets according to an embodiment of the present disclosure is shown;

[0033] Figure 2 A schematic diagram showing a user click point according to an embodiment of the present disclosure is shown;

[0034] Figure 3 This shows the annotation result of the initial region according to an embodiment of the present disclosure;

[0035] Figure 4 A schematic diagram of a sampling area according to an embodiment of the present disclosure is shown;

[0036] Figure 5 A schematic diagram showing an image to be labeled according to an embodiment of the present disclosure;

[0037] Figure 6A schematic diagram showing an initial region marked according to an embodiment of the present disclosure is shown;

[0038] Figure 7 A schematic diagram showing the final area marked according to an embodiment of the present disclosure;

[0039] Figure 8 A schematic block diagram showing an auxiliary annotation device for image targets according to an embodiment of the present disclosure;

[0040] Figure 9 A schematic block diagram of an auxiliary annotation device for an image target according to an embodiment of the present disclosure is shown. Detailed Implementation

[0041] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.

[0042] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0043] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.

[0044] <Method Implementation>

[0045] Figure 1 A flowchart illustrating an auxiliary annotation method for image targets according to an embodiment of the present disclosure is shown. Figure 1 As shown, the method includes steps S1100-S1400.

[0046] S1100: Obtain the image to be labeled and the user's click points.

[0047] The image to be labeled is the original image containing the target to be identified. The target to be identified can be any object of interest, such as a person, vehicle, animal, or plant, without any specific limitation.

[0048] The user click point is a click point generated based on the user's click operation on the image to be labeled. Specifically, this user click point can be a positive click point generated based on the user's positive click operation within the area where the target to be identified is located. Figure 2In the image to be labeled shown, the target to be identified is the figure skater. The user can generate a positive click point in the area where the figure skater is located (such as the abdomen) by performing a positive click operation. This positive click point is the user's click point. The user's click point may also include negative click points generated based on the user's negative click operation in the area where the target to be identified is located, which is not specifically limited here.

[0049] It should be noted that after a user click point is generated, its information will be automatically recorded. This information includes at least the location of the click point and whether it is a positive or negative click, so that the initial area can be marked later based on the user click point information.

[0050] The number of clicks by the user can be one or more, without specific limitations. If the number of clicks is multiple, all clicks can be positive, or they can include both positive and negative clicks, without specific limitations.

[0051] S1200 marks the initial area of the target to be identified based on the user's click point.

[0052] In one possible implementation, the initial region of the target to be identified can be marked using a semantic segmentation model (e.g., the RTM model) based on user clicks. Specifically, the image to be labeled and the information of the user clicks are input into the semantic segmentation model. The semantic segmentation model combines the information of the user clicks to determine whether the current pixel belongs to the target pixel by pixel, and marks the region where the target to be identified is located in the image to be labeled based on the determination result. This region is the initial region where the target to be identified is located.

[0053] The initial area marked on the image based directly on user clicks cannot completely cover the entire target to be identified. Specifically, for example... Figure 3 As shown, the target to be identified is the figure skater in the figure. According to the initial area marked in step S1200, it does not cover the figure skater's back, arms and right leg. Therefore, it is necessary to continue to perform subsequent steps to further improve the accuracy of the target identification.

[0054] Step S1300: Sample the image to be labeled to obtain a preset number of virtual click points. It should be explained here that virtual click points refer to additional click points selected from the image through automatic sampling, in addition to the manually added click points annotated by the user. By selecting these additional click points, the number of clicks used for image annotation is enriched, thereby expanding the number of clicks for image annotation.

[0055] In one possible implementation, when sampling the image to be labeled to obtain a preset number of virtual click points, it can be based on uniform sampling.

[0056] The following section uses a virtual click point as an example to illustrate the process of obtaining a virtual click point by uniformly sampling the image to be labeled. The specific steps include S1310-S1340.

[0057] S1310, Determine the sampling area in the image to be labeled, and obtain the width and height of the sampling area.

[0058] In one possible implementation, the sampling region can be the area where the image to be identified is located, that is, uniform sampling is performed throughout the entire image to be identified.

[0059] In this implementation, the width W of the sampling region is the width of the image to be recognized, and the height H of the sampling region is the height of the image to be recognized. For example, if the width and height of the image to be recognized are W1 and H1 respectively, then the width W of the sampling region is W1, and the height H of the sampling region is H1.

[0060] In another possible implementation, the sampling region can also be a pre-selected area within the image to be labeled, where the target to be identified is located. Specifically, such as... Figure 4 As shown, the sampling area can be the approximate area where the target to be identified is located, selected by the user using a bounding box. The bounding box used to select the target to be identified is the sampling area. It should be noted that the bounding box can be any box that includes the target to be identified, and it is not required that the bounding box be perfectly aligned with the boundary of the target to be identified.

[0061] In this implementation, the width W of the sampling region is the width of the target bounding box, and the height H of the sampling region is the height of the target bounding box. For example, if the width and height of the target bounding box are W2 and H2 respectively, then the width W of the sampling region is W2, and the height H of the sampling region is H2.

[0062] In this feasible approach, with a target bounding box, the generation of virtual click points can be carried out within the target bounding box, making the generated virtual click points more targeted and thus improving the accuracy of the target labeling.

[0063] S1320, within the width range, a first integer is randomly generated through uniform sampling as the width value of the virtual click point. Specifically, after obtaining the width W of the sampling area, the width range can be determined as (0, W]. Then, within the width range of (0, W), a first integer w is randomly generated through uniform sampling, and this w is the width value of the generated virtual click point.

[0064] S1330, within the height range, a second integer is randomly generated through uniform sampling as the height value of the virtual click point. Specifically, after obtaining the height H of the sampling area, the height range can be determined as (0, H]. Then, within the height range of (0, H), a second integer h is randomly generated through uniform sampling, and this h is the height value of the generated virtual click point.

[0065] S1340, based on the width and height values, obtain the virtual click point. Specifically, the point P(w, h) determined by the width value w and the height value h is the obtained virtual click point.

[0066] Repeating the steps above to generate virtual click points will yield a preset number of virtual click points. The preset number can be configured according to the specific application scenario. For example, it can be configured to 50. Furthermore, for targets with complex outlines (such as vegetation), the preset number can be increased appropriately; for targets with simple shapes and uniform colors, the preset number can be reduced appropriately.

[0067] It should be noted that after generating a virtual click point, its information will be stored in `vp_list`, including at least its location information. In one possible implementation, to avoid duplicate virtual click points in `vp_list`, after obtaining the virtual click point, step S1350 is included: determining whether the virtual click point is the same as an already generated virtual click point based on its information. If the virtual click point is different from the already generated virtual click point, it is then stored in `vp_list`; if the virtual click point is the same as the already generated virtual click point (i.e., a duplicate), it is deleted, and the process returns to steps S1320-S1350.

[0068] In this implementation method, step S1350 can effectively avoid generating duplicate virtual click points, thereby ensuring that the preset number of virtual click points generated can provide rich information about the area where the target to be identified is located.

[0069] In one possible implementation, after generating a preset number of virtual click points, labels can be assigned to each virtual click point based on its positional relationship with the initial area, thus representing the positional relationship between the virtual click point and the initial area. Specifically, the labels can include positive and negative labels. For any virtual click point, if it is located within the initial area (i.e., the virtual click point is on the target to be identified), the label of the virtual click point is set to a positive label to represent that the virtual click point is within the initial area; if it is located outside the initial area (i.e., the virtual click point is outside the target to be identified), the label of the virtual click point is set to a negative label to represent that the virtual click point is outside the initial area. In this possible implementation, after generating the labels of the virtual click points, the labels of the virtual click points can be stored as information about the virtual click points in vp_list.

[0070] Considering the difference between the size of the target to be identified and the size of the bounding box or the image to be labeled, the number of virtual click points with positive labels and those with negative labels may be unbalanced, thus affecting the accuracy of the auxiliary labeling. To avoid this problem, one possible implementation is to use oversampling to ensure that the number of generated virtual click points with positive labels and those with negative labels both reach or exceed a specified number, and then select a balanced number of positive and negative virtual click points from the generated virtual click points.

[0071] For example, in a feasible implementation with a preset quantity of 50, the number of virtual click points with positive labels and the number of virtual click points with negative labels are both ideally 25, at which point the positive and negative virtual click points are in an optimal balance. Therefore, oversampling can be used to ensure that the number of generated virtual click points with positive labels and virtual click points with negative labels both reach or exceed 25. Then, 25 virtual click points with positive labels and 25 virtual click points with negative labels are selected from the generated virtual click points respectively, thus ensuring the balance of the number of positive and negative virtual click points.

[0072] In this feasible approach, oversampling can be used to ensure a balance in the number of positive and negative virtual click points, thereby achieving balanced coverage of the target to be identified and improving the accuracy of auxiliary marking.

[0073] S1400: Based on the user's click points and the positional relationship between each virtual click point and the initial area, mark the final area of the target to be identified in the image to be labeled.

[0074] In one possible implementation, the final region of the target object in the image to be labeled is also based on a semantic segmentation model, according to the user's click points and the positional relationship between each virtual click point and the initial region. Specifically, the image to be labeled, the positions of each virtual click point, and their positional relationship with the initial region are input into the semantic segmentation model, which then labels the region where the target object is located in the image. This region is the final region where the target object is located.

[0075] In the embodiment where the positional relationship between each virtual click point and the initial region is represented by the label of the virtual click point, when executing step S1400, the image to be labeled, the information of the user click point, and the position and label of each virtual click point are input into the semantic segmentation model. The semantic segmentation model then labels the region where the target to be identified is located in the image to be labeled, and this region is the final region where the target to be identified is located.

[0076] It should be noted that when using semantic segmentation models to assist in labeling small-scale images, the labeled area of the target in the image is relatively accurate, but the fit between the labeled contour and the actual contour of the target is poor. When using semantic segmentation models to assist in labeling large-scale images, although the fit between the labeled contour and the actual contour of the target is high, the accuracy of the labeled area of the target in the image is relatively poor.

[0077] In order to simultaneously improve the accuracy of the labeled target in the image region and the fit between the labeled contour and the real contour of the target, in one possible implementation, the user click point and the initial region can both be obtained based on the compressed image of the image to be labeled.

[0078] Specifically, when the image to be labeled is acquired, it is first compressed to a preset size. Then, user click points are obtained from the compressed image. Next, the compressed image and user click point information are input into the semantic segmentation model, which then labels the initial region where the target to be identified is located in the compressed image. After the initial region where the target to be identified is labeled in the compressed image, steps S1300-S1400 are executed.

[0079] In this feasible method, the image to be labeled is first compressed to a preset size. Then, an initial region is labeled in the compressed image using a semantic segmentation model, which improves the accuracy of the initial region labeling. Furthermore, a preset number of virtual click points are generated on the original image to be labeled. By combining the user's click points and the positions of each virtual click point with their positional relationship to the initial region, the region where the target to be identified is located is labeled in the image, making the labeled contours fit the real contours of the target to be identified more closely. In other words, this feasible method can simultaneously improve the accuracy of the labeled target's location in the image and the fit between the labeled contours and the real contours of the target to be identified.

[0080] It should be noted that semantic segmentation models all use backbones in their modeling process. These backbones are basically based on early open-source image datasets, such as MS COCO. The images in these datasets are not large, so the feature extraction of the backbone and the performance of the model reach their best at around 480p. Therefore, the preset size can be 480p. That is, when the image to be labeled is acquired, it is first compressed to 480p, and then the user click point is generated in the 480p compressed image.

[0081] This disclosure provides an auxiliary labeling method for image targets, including: acquiring an image to be labeled and user click points; using the user click points to label the initial region of the target to be identified; sampling the image to be labeled to obtain a preset number of virtual click points; and labeling the final region of the target to be identified in the image to be labeled based on the positional relationship between the user click points, each virtual click point, and the initial region. In this disclosure, the automatic expansion of click points is achieved by automatically generating a preset number of virtual click points, thereby enabling accurate labeling of the region where the target to be identified is located in the image based on the expanded click points.

[0082] The following section provides a specific example to further explain the auxiliary annotation method for publicly available image targets.

[0083] In this embodiment, the auxiliary annotation method for image targets includes the following steps:

[0084] First, obtain the image to be labeled (i.e., the original image), specifically as follows: Figure 5 As shown.

[0085] Second, compress the image to be labeled to 480p to obtain a compressed image.

[0086] Third, obtain a positive user click point generated by the user in the area where the target to be identified is located in the compressed image through the click operation, and save the information of the user click point, including the location of the user click point and whether the user click point is a positive or negative click point.

[0087] Fourth, the compressed image and user click information are input into the RTM model. The RTM model then annotates and stores the initial region of the target to be identified from the compressed image. The annotation result of the initial region is as follows: Figure 6 As shown.

[0088] Fifth, the compressed image is restored to the image to be labeled, and the image to be labeled is sampled to obtain 50 virtual click points. Each virtual click point is labeled according to its positional relationship with the initial region, and the information of each virtual click point is recorded. The information of the virtual click points includes their position and label. When a virtual click point is within the initial region, its label is set to a positive label; when a virtual click point is outside the initial region, its label is set to a negative label.

[0089] Sixth, the information of the image to be labeled, the user's click points, and the information of each virtual click point are input into the RTM model. The RTM model then labels the final region of the target to be identified in the image to be labeled. The labeling result of the final region is as follows: Figure 7 As shown.

[0090] Depend on Figure 7 As can be seen, the image target auxiliary labeling method of this embodiment can simultaneously improve the accuracy of the labeled target in the region of the image and the fit between the labeled contour and the real contour of the target.

[0091] <Device Embodiment>

[0092] Figure 8 A schematic block diagram of an auxiliary annotation device for image targets according to an embodiment of the present disclosure is shown. Figure 8 As shown, the auxiliary labeling device 100 includes:

[0093] Image acquisition module 110 is used to acquire the image to be labeled and the user's click points;

[0094] The first annotation module 120 is used to annotate the initial region of the target to be identified in the image based on the user's click point;

[0095] The virtual click point generation module 130 is used to sample the image to be labeled and obtain a preset number of virtual click points;

[0096] The second annotation module 140 is used to annotate the final region of the target to be identified in the image based on the positional relationship between the user's click point and each virtual click point and the initial region.

[0097] In one possible implementation, uniform sampling is used when sampling the image to be labeled to obtain a preset number of virtual click points.

[0098] In one possible implementation, the virtual click point generation module 130 includes:

[0099] The sampling region determination module is used to determine the sampling region in the image to be labeled and to obtain the width and height of the sampling region;

[0100] The width calculation module is used to randomly generate a first integer as the width value of the virtual click point within the width range by uniform sampling;

[0101] The height calculation module is used to randomly generate a second integer as the height value of the virtual click point within a height range through uniform sampling.

[0102] The virtual point generation module is used to obtain virtual click points based on width and height values.

[0103] In one possible implementation, the sampling region is the area in the image to be labeled where the target to be identified is pre-selected.

[0104] In one possible implementation, the virtual click point generation module 130 further includes a deduplication module, which, after obtaining the virtual click point, determines whether the virtual click point is the same as the previously generated virtual click point; and stores the virtual click point when it is determined that the virtual click point is different from the previously generated virtual click point.

[0105] In one possible implementation, the positional relationship between the virtual click point and the initial region is represented by the label of the virtual click point, which includes positive and negative labels. A positive label indicates that the virtual click point is located within the initial region, and a negative label indicates that the virtual click point is outside the initial region.

[0106] In one possible implementation, the number of virtual clicks with positive labels is balanced with the number of virtual clicks with negative labels among a preset number of virtual clicks.

[0107] In one possible implementation, both the image acquisition module 110 and the first annotation module 120 obtain the user click point and the initial region based on the compressed image of the image to be annotated.

[0108] <Equipment Example>

[0109] Figure 9A schematic block diagram of an auxiliary annotation device for image targets according to an embodiment of the present disclosure is shown. Figure 9 As shown, the image target auxiliary annotation device 200 includes a processor 210 and a memory 220 for storing executable instructions of the processor 210. The processor 210 is configured to implement an auxiliary annotation method for any of the aforementioned image targets when executing the executable instructions.

[0110] It should be noted here that the number of processors 210 can be one or more. Furthermore, the image target auxiliary annotation device 200 of this embodiment may also include an input device 230 and an output device 240. The processors 210, memory 220, input device 230, and output device 240 can be connected via a bus or other means, which are not specifically limited here.

[0111] The memory 220, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, and various modules, such as the program or module corresponding to the image target auxiliary annotation method of this disclosure embodiment. The processor 210 executes various functional applications and data processing of the image target auxiliary annotation device 200 by running the software program or module stored in the memory 220.

[0112] Input device 230 can be used to receive input digital numbers or signals. These signals may include key signals related to user settings and function control of the device / terminal / server. Output device 240 may include a display device such as a screen.

[0113] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. An auxiliary annotation method for image targets, characterized in that, include: Obtain the image to be labeled and the user's click points; Based on the user's click points, mark the initial area of the target to be identified; The image to be labeled is sampled to obtain a preset number of virtual click points; Based on the positional relationship between the user click points and each of the virtual click points and the initial region, the final region of the target to be identified in the image to be labeled is marked. Both the user click points and the initial region are obtained based on the compressed image of the image to be labeled. Specifically, when the image to be labeled is obtained, it is first compressed to a preset size. Then, the user click points are obtained in the compressed image. Next, the compressed image and the user click point information are input into the semantic segmentation model, and the semantic segmentation model labels the initial region where the target to be identified is located in the compressed image. After marking the initial region where the target to be identified is located in the compressed image to be labeled, the compressed image including the initial region is restored to the size of the image to be labeled to obtain the restored image to be labeled. Then, sampling is performed on the restored image to obtain a preset number of virtual click points.

2. The method according to claim 1, characterized in that, When sampling the image to be labeled to obtain a preset number of virtual click points, uniform sampling is used.

3. The method according to claim 2, characterized in that, When uniformly sampling the image to be labeled to obtain virtual click points, the process includes: The sampling region is determined in the image to be labeled, and the width and height of the sampling region are obtained; Within the width range, a first integer is randomly generated through uniform sampling as the width value of the virtual click point; Within the height range, a second integer is randomly generated through uniform sampling as the height value of the virtual click point; The virtual click point is obtained based on the width and height values.

4. The method according to claim 3, characterized in that, The sampling area is the region in the image to be labeled where the target to be identified is located, which is pre-selected.

5. The method according to claim 3, characterized in that, After obtaining the virtual click point, the process also includes: Determine whether the virtual click point is the same as the previously generated virtual click point; When it is determined that the virtual click point is different from the already generated virtual click point, the virtual click point is stored.

6. The method according to claim 1, characterized in that, The positional relationship between the virtual click point and the initial region is characterized by the label of the virtual click point, wherein the label includes a positive label and a negative label, the positive label indicating that the virtual click point is located within the initial region, and the negative label indicating that the virtual click point is outside the initial region.

7. The method according to claim 6, characterized in that, Among the preset number of virtual click points, the number of virtual click points with positive labels is balanced with the number of virtual click points with negative labels.

8. An auxiliary annotation device for image targets, characterized in that, include: The image acquisition module is used to acquire the image to be labeled and the user's click points; The first annotation module is used to annotate the initial region of the target to be identified in the image to be annotated based on the user's click point; The virtual click point generation module is used to sample the image to be labeled to obtain a preset number of virtual click points; The second annotation module is used to annotate the final region of the target to be identified in the image to be annotated based on the positional relationship between the user click point and each of the virtual click points and the initial region; Both the user click points and the initial region are obtained based on the compressed image of the image to be labeled. Specifically, when the image to be labeled is obtained, it is first compressed to a preset size. Then, the user click points are obtained in the compressed image. Next, the compressed image and the user click point information are input into the semantic segmentation model, and the semantic segmentation model labels the initial region where the target to be identified is located in the compressed image. After marking the initial region where the target to be identified is located in the compressed image to be labeled, the compressed image including the initial region is restored to the size of the image to be labeled to obtain the restored image to be labeled. Then, sampling is performed on the restored image to obtain a preset number of virtual click points.

9. An auxiliary annotation device for image targets, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the method of any one of claims 1 to 7 when executing the executable instructions.