Video labeling method and device, computer device, and storage medium

By obtaining the playback time from the video layer and generating target annotation boxes in the annotation layer, the problem of low efficiency in traditional video annotation is solved, and fast and efficient video annotation is achieved.

CN115713711BActive Publication Date: 2026-06-23ZHAOLIAN CONSUMER FINANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHAOLIAN CONSUMER FINANCE CO LTD
Filing Date
2022-11-08
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional video annotation techniques are inefficient and make it difficult to quickly and effectively add annotation boxes and labels to videos.

Method used

The playback time of the image frame to be annotated is obtained in the video layer, and the annotation is performed in the annotation layer to generate the target annotation box information and save the annotated video.

Benefits of technology

It enables fast and efficient video annotation without modifying the image frames to be annotated, improving annotation efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115713711B_ABST
    Figure CN115713711B_ABST
Patent Text Reader

Abstract

The application relates to a video labeling method and device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: acquiring a playing time corresponding to a to-be-labeled image frame in a playing video; the to-be-labeled image frame is located in a video layer; labeling the to-be-labeled image frame in a labeling layer to obtain labeling information corresponding to the playing time; the labeling information is information of a target labeling frame corresponding to the to-be-labeled image frame, and the target labeling frame is located in the labeling layer; saving the playing video and the labeling information corresponding to the target labeling frame of each playing time in the labeling layer to obtain a labeled target video. The method can improve the efficiency of video labeling.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a video annotation method, apparatus, computer equipment, storage medium, and computer program product. Background Technology

[0002] With the development of computer technology, video annotation technology has emerged. Video annotation refers to the process of adding annotation boxes, annotation content, or labels to videos.

[0003] Traditional techniques for video annotation involve modifying image frames within the video, which suffers from low annotation efficiency. Summary of the Invention

[0004] Therefore, it is necessary to provide a video annotation method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve annotation efficiency in response to the above-mentioned technical problems.

[0005] Firstly, this application provides a video annotation method. The method includes:

[0006] Obtain the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer;

[0007] The image frame to be annotated is annotated in the annotation layer to obtain the annotation information corresponding to the playback time; the annotation information is the information of the target annotation box corresponding to the image frame to be annotated, and the target annotation box is located in the annotation layer.

[0008] Save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback time to obtain the annotated target video.

[0009] In one embodiment, obtaining the playback time corresponding to the image frame to be labeled in the playing video includes:

[0010] The playback interface of the video is displayed, and the playback interface includes a playback area and a labeling area;

[0011] In response to a trigger operation on the annotation control in the annotation area, the playback video of the video layer in the playback area is paused, and the image frame to be annotated in the video layer in the playback area, as well as the playback time corresponding to the image frame to be annotated, are obtained.

[0012] In one embodiment, the step of annotating the image frame to be annotated in the annotation layer to obtain the annotation information corresponding to the playback time includes:

[0013] In response to a trigger operation on the annotation control in the annotation area, a new annotation entry is added to the annotation area; the annotation entry includes an edit control and a save control;

[0014] In response to a trigger operation on the editing control, an initial annotation box is displayed in the annotation layer of the playback area;

[0015] The initial annotation box is adjusted to obtain the target annotation box corresponding to the image frame to be annotated;

[0016] In response to a trigger operation on the save control, the relative position, length, and height of the target annotation box relative to the image frame to be annotated are obtained, and the relative position, length, and height are used as the annotation information corresponding to the playback time.

[0017] In one embodiment, the annotation entry further includes adjustment controls; obtaining the playback time corresponding to the image frame to be annotated in the playing video further includes:

[0018] If the image frame to be labeled is not the target image frame to be labeled, then in response to the trigger operation of the adjustment control, a preset number of image frames to be played are obtained; the preset number of image frames to be played are obtained based on the trigger command of the adjustment control and the image frame to be labeled.

[0019] The video layer in the playback area plays the image frames to be played at a preset speed.

[0020] In response to a trigger operation on the editing control, the currently playing image frame in the video layer is used as the updated image frame to be labeled, and the playback time corresponding to the updated image frame to be labeled is obtained.

[0021] In one embodiment, obtaining the relative position, length, and height of the target annotation box relative to the image frame to be annotated, in response to a trigger operation on the save control, includes:

[0022] In response to a trigger operation on the save control, the position coordinates corresponding to the four vertices of the image frame to be labeled are obtained;

[0023] The position coordinates with the minimum coordinate value are used as the reference origin coordinates. The position coordinates with the same horizontal axis coordinate value as the reference origin coordinates are used as the reference horizontal axis coordinates. The position coordinates with the same vertical axis coordinate value as the reference origin coordinates are used as the reference vertical axis coordinates.

[0024] Based on the reference origin coordinates, reference horizontal axis coordinates, and reference vertical axis coordinates, establish a labeling coordinate system on the plane where the labeling layer is located;

[0025] Based on the labeled coordinate system, the relative position of the target labeled box with respect to the image frame to be labeled, as well as the length and height of the target labeled box, are obtained.

[0026] In one embodiment, obtaining the relative position of the target bounding box with respect to the image frame to be labeled, and the length and height of the target bounding box, based on the labeled coordinate system, includes:

[0027] Obtain the vertex coordinates of the four vertices of the target bounding box in the annotation coordinate system;

[0028] The vertex coordinates with the minimum coordinate value are determined as the relative position of the target annotation box with respect to the image frame to be annotated;

[0029] Based on the vertex coordinates, the length and height of the target bounding box are obtained.

[0030] In one embodiment, the video annotation method further includes:

[0031] In response to a trigger operation on the editing control, during the annotation of the image frame to be annotated, the playback control in the playback area and the progress bar corresponding to the playing video are locked.

[0032] Secondly, this application also provides a video annotation device. The device includes:

[0033] The acquisition module is used to acquire the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer;

[0034] The annotation module is used to annotate the image frame to be annotated in the annotation layer to obtain the annotation information corresponding to the playback time; the annotation information is the information of the target annotation box corresponding to the image frame to be annotated, and the target annotation box is located in the annotation layer.

[0035] The saving module is used to save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback moment, so as to obtain the annotated target video.

[0036] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:

[0037] Obtain the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer;

[0038] The image frame to be annotated is annotated in the annotation layer to obtain the annotation information corresponding to the playback time; the annotation information is the information of the target annotation box corresponding to the image frame to be annotated, and the target annotation box is located in the annotation layer.

[0039] Save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback time to obtain the annotated target video.

[0040] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:

[0041] Obtain the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer;

[0042] The image frame to be annotated is annotated in the annotation layer to obtain the annotation information corresponding to the playback time; the annotation information is the information of the target annotation box corresponding to the image frame to be annotated, and the target annotation box is located in the annotation layer.

[0043] Save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback time to obtain the annotated target video.

[0044] The aforementioned video annotation method, apparatus, computer equipment, storage medium, and computer program product acquire the playback time corresponding to the image frame to be annotated in a video playing in a video layer. It then annotates the image frame to be annotated in an annotation layer, obtaining the target annotation box corresponding to the image frame in the annotation layer, along with the annotation information corresponding to the target annotation box. This annotation information is used as the annotation information corresponding to the playback time of the image frame to be annotated. Finally, the playing video and the annotation information corresponding to the target annotation boxes at each playback time in the annotation layer are saved, resulting in the annotated target video. By annotating the image frame in the annotation layer without modifying the image frame itself, and by neither annotating nor modifying / undoing the annotation of the image frame, the image frame to be annotated is not destroyed. This allows for rapid annotation of image frames according to actual needs, improving the efficiency of video annotation. Attached Figure Description

[0045] Figure 1 This is a diagram illustrating the application environment of a video annotation method in one embodiment;

[0046] Figure 2 This is a flowchart illustrating a video annotation method in one embodiment;

[0047] Figure 3This is a flowchart illustrating the steps for obtaining annotation information in one embodiment;

[0048] Figure 4 This is a flowchart illustrating the relative position determination steps in one embodiment;

[0049] Figure 5 This is a schematic diagram of the playback interface in one embodiment;

[0050] Figure 6 This is a schematic diagram of the playback interface in another embodiment;

[0051] Figure 7 This is a structural block diagram of a video annotation device in one embodiment;

[0052] Figure 8 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0053] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0054] The video annotation method provided in this application embodiment can be applied to, for example, Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated on server 104 or placed on a cloud or other network server. Both the terminal and server can be used independently to execute the video annotation method provided in this embodiment. The terminal and server can also work together to execute the video annotation method provided in this embodiment. For example, a computer device obtains the playback time corresponding to the image frame to be annotated in a video played in a video layer, annotates the image frame to be annotated in the annotation layer, obtains the target annotation box corresponding to the image frame to be annotated in the annotation layer, and the annotation information corresponding to the target annotation box, uses the annotation information as the annotation information corresponding to the playback time of the image frame to be annotated, and then saves the video playback and the annotation information corresponding to the target annotation box in the annotation layer for each playback time to obtain the annotated target video. Terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can be smart speakers, smart TVs, smart air conditioners, smart vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted devices. Server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers.

[0055] In one embodiment, such as Figure 2 As shown, a video annotation method is provided. This method can be applied to computer devices, which can be terminals or servers. The method can be executed independently by the terminal or server, or it can be implemented through interaction between the terminal and the server. This embodiment illustrates the application of this method to a computer device, including steps 202 to 206.

[0056] Step 202: Obtain the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer.

[0057] In this context, "playing video" refers to dynamic images. It can be understood as a collection of static images composed of multiple still images played at a certain speed, visually forming a continuous sequence. Examples include movies and short videos on platforms like TikTok. Playing videos can be displayed on web pages or played in media players. Storage formats for playing videos include, but are not limited to, AVI (Audio Video Interleaved), RMVB (RealMedia Variable Bitrate), and MPEG (Moving Picture Experts Group). "Image frame to be labeled" refers to the image frame within the multiple frames of the playing video that will be labeled. "Playback time" refers to the specific moment the image frame to be labeled corresponds to during the entire video playback. For example, if the video duration is one hour, the image frame to be labeled might play at the 16th and 20th minute of that hour. "Video layer" refers to the layer used to play the playing video. This can be understood as follows: the video displayed on the terminal is a two-dimensional plane. Assuming the x-axis and y-axis represent the position of the video in this two-dimensional plane, a three-dimensional space can be constructed by introducing a Z-axis to overlay elements on top of or below the video. The Z-axis represents the stacking level or order of the elements. Alternatively, the video can be understood as being on one layer, and the overlay elements on another layer, with different stacking orders. The video displayed on the terminal is an image formed by the overlay of these two layers. For example, a webpage player uses the z-index property to represent the stacking order of elements. Assuming z-index:100 represents the stacking order of the video and z-index:300 represents the stacking order of the annotation boxes, the video is on the playback layer, the annotation boxes are on the annotation layer, and the annotation boxes are stacked on top of the video.

[0058] For example, the computer device obtains the image frame to be labeled from the video layer, and then obtains the playback time corresponding to the image frame to be labeled.

[0059] In one embodiment, the computer device obtains a trigger operation for the annotation control, obtains the current image frame of the video playing in the video layer, uses the current image frame as the image frame to be annotated, and then obtains the playback time corresponding to the image frame to be annotated.

[0060] Step 204: Label the image frame to be labeled on the label layer to obtain the label information corresponding to the playback time; the label information is the information of the target label box corresponding to the image frame to be labeled, and the target label box is located on the label layer.

[0061] The annotation layer refers to the layer used for annotating the playing video. It can be understood as the layer where annotation boxes and annotation content are placed. Annotation layers are stacked on top of the video layer. Annotation information refers to relevant information about the annotation content. Annotation information includes, but is not limited to, the position, length, and width of the annotation box, annotation text, annotation icons, etc. For example, annotation information might be: at coordinates (x, y), 5 cm long, and 3 cm wide.

[0062] For example, the computer device annotates the image frame to be annotated on the annotation layer to obtain the target annotation box corresponding to the image frame to be annotated, and then obtains the annotation information of the target annotation box and uses the annotation information as the annotation information corresponding to the playback time.

[0063] In one embodiment, the computer device acquires a trigger operation for the editing control, establishes a label layer that is stacked above the video layer, labels the image frame to be labeled based on the label layer, obtains the target label box corresponding to the image frame to be labeled, and then acquires the label information of the target label box, and uses the label information as the label information corresponding to the playback time.

[0064] In one embodiment, the computer device obtains a trigger operation for the editing control, displays an initial annotation box in the annotation layer corresponding to the playback time, annotates the image frame to be annotated based on the initial annotation box, obtains the target annotation box corresponding to the image frame to be annotated, then obtains the annotation information of the target annotation box, and uses the annotation information as the annotation information corresponding to the playback time.

[0065] Step 206: Save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback moment to obtain the annotated target video.

[0066] The target video refers to the video that has been annotated.

[0067] For example, the computer device saves the video being played and the annotation information corresponding to each playback moment to obtain the annotated target video.

[0068] In one embodiment, the computer device saves the playing video in a first folder and saves the annotation information corresponding to the target annotation boxes in the annotation layer at each playback moment in a second folder. The first folder corresponds to a first stacking order, which is used to characterize the stacking order of the playing video. The second folder corresponds to a second stacking order, which is used to characterize the stacking order of the target annotation boxes corresponding to each annotation information in the second folder.

[0069] In the above video annotation method, the playback time corresponding to the image frame to be annotated in the video being played in the video layer is obtained. The image frame to be annotated is then annotated in the annotation layer, obtaining the target annotation box corresponding to the image frame in the annotation layer, along with the annotation information corresponding to the target annotation box. This annotation information is used as the annotation information corresponding to the playback time of the image frame to be annotated. Then, the playing video and the annotation information corresponding to the target annotation boxes at each playback time in the annotation layer are saved, resulting in the annotated target video. By annotating the image frame in the annotation layer without modifying the image frame itself, whether annotating, modifying, or revoking the annotation of the image frame, the image frame to be annotated will not be destroyed. This allows for rapid annotation of image frames according to actual needs, improving the efficiency of video annotation.

[0070] In one embodiment, obtaining the playback time corresponding to the image frame to be labeled in the playing video includes:

[0071] The playback interface for the video is displayed, which includes a playback area and an annotation area. In response to a trigger operation on the annotation control in the annotation area, the playback of the video layer in the playback area is paused, and the image frame to be annotated in the video layer in the playback area, as well as the playback time corresponding to the image frame to be annotated, are obtained.

[0072] The playback interface refers to the main interface for displaying the video. The playback area is the area where the video is played. The size of the playback area can be adjusted as needed. The annotation area is the area where annotation-related controls are placed and the annotation content is displayed. The annotation area can be located below the playback area, to the right of the playback area, etc., and its position and size can be adjusted as needed.

[0073] For example, a computer device displays a video playback interface, which includes a playback area and an annotation area. In response to a trigger operation on an annotation control in the annotation area, the video being played in the video layer in the playback area is paused, the current video frame of the video layer in the playback area is obtained, the current video frame is used as the image frame to be annotated, and then the playback time corresponding to the image frame to be annotated is obtained.

[0074] In this embodiment, the video is played through the playback area in the playback interface, and the video in the playback area is paused by the annotation control in the annotation area of ​​the playback interface. Then, the image frame to be annotated and the corresponding playback time of the image frame are obtained. The playback area and the annotation area are located on the same interface, which is convenient to operate and can improve the annotation efficiency of the video.

[0075] In one embodiment, such as Figure 3 As shown, the image frames to be annotated are labeled in the annotation layer, and the annotation information corresponding to the playback time is obtained, including:

[0076] Step 302: In response to a trigger operation on the annotation controls in the annotation area, add a new annotation entry in the annotation area; the annotation entry includes an edit control and a save control.

[0077] In this context, a label entry refers to a whole comprised of controls and edit boxes associated with the same label. It can be understood as a small area within the labeling region, where all controls and display blocks are related to the same label. This small block may contain the controls required for the labeling, a label box for inputting label text, and a display block showing the playback time and label number. Edit controls are buttons used to annotate the image frames to be labeled. Save controls are buttons used to save the target label box.

[0078] For example, in response to a triggered operation on the annotation controls in the annotation area, the computer device adds a new annotation entry in the annotation area that includes an edit control and a save control.

[0079] Step 304: In response to a triggering operation on the editing control, display the initial annotation box in the annotation layer of the playback area.

[0080] The initial dimension box refers to the first dimension box that is set. The position and size of the initial dimension box are determined by the initial settings.

[0081] For example, in response to a triggering operation on the editing control, the computer device displays an initial annotation box in the annotation layer of the playback area.

[0082] Step 306: Adjust the initial annotation box to obtain the target annotation box corresponding to the image frame to be annotated.

[0083] The target annotation box refers to the final annotation box of the image to be annotated. The position and size of the target annotation box can be adjusted according to actual needs.

[0084] For example, the computer device obtains an adjustment instruction to adjust the initial annotation box, and adjusts the position and size of the initial annotation box according to the adjustment instruction to obtain the target annotation box corresponding to the image frame to be annotated.

[0085] In one embodiment, the annotator adjusts the initial annotation box using a mouse or terminal display screen. The computer device receives the adjustment instruction to adjust the initial annotation box and adjusts the position and size of the initial annotation box according to the adjustment instruction to obtain the target annotation box corresponding to the image frame to be annotated.

[0086] Step 308: In response to the trigger operation for the save control, obtain the relative position, length and height of the target annotation box relative to the image frame to be annotated, and use the relative position, length and height as the annotation information corresponding to the playback time.

[0087] In this context, relative position refers to the position of the target bounding box relative to the image frame to be labeled. Relative position can be represented using two-dimensional coordinates. Length refers to the side length of the target bounding box. The unit of length can be centimeters, meters, etc. Height refers to the height of the target bounding box. The unit of height can also be centimeters, meters, etc. The units for length and height can be the same or different.

[0088] For example, in response to a trigger operation on the save control, the computer device obtains the relative position, length, and height of the target annotation box relative to the image frame to be annotated, and uses the relative position, length, and height as the annotation information corresponding to the playback time.

[0089] In one embodiment, the computer device obtains an adjustment instruction to adjust the first initial annotation box, adjusts the first initial annotation box to the position and size corresponding to the adjustment instruction, obtains the first target annotation box, and generates a second initial annotation box in the lower left corner. If the second initial annotation box is needed, it is adjusted; if it is not needed, it is not adjusted. In response to a trigger operation on the save control, only the adjusted target annotation box is saved.

[0090] In this embodiment, in response to a trigger operation on the annotation control in the annotation area, a new annotation entry containing an edit control and a save control is added to the annotation area. First, in response to a trigger operation on the edit control, an initial annotation box is displayed in the annotation layer of the playback area. The initial annotation box is adjusted to obtain the target annotation box. Then, in response to a trigger operation on the save control, the relative position, length, and height of the target annotation box relative to the image frame to be annotated are obtained. The relative position, length, and height are used as the annotation information corresponding to the playback time, completing the annotation of the image frame to be annotated. The annotation of the image frame to be annotated can be completed using only annotation controls, edit controls, and save controls, making the annotation process convenient and improving the efficiency of video annotation.

[0091] In one embodiment, the annotation entry further includes adjustment controls; obtaining the playback time corresponding to the image frame to be annotated in the playing video further includes:

[0092] If the image frame to be labeled is not the target image frame, then in response to the trigger operation of the adjustment control, a preset number of image frames to be played are obtained; the preset number of image frames to be played are obtained based on the trigger command of the adjustment control and the image frame to be labeled; the image frames to be played are played in the video layer of the playback area at a preset speed; in response to the trigger operation of the editing control, the currently playing image frame in the video layer is used as the updated image frame to be labeled, and the playback time corresponding to the updated image frame to be labeled is obtained.

[0093] In this context, "target labeled image frame" refers to the actual image frame that needs to be labeled. "Adjustment controls" are buttons for adjusting the image frame to be labeled. Adjustment controls can include forward and backward controls. "Image frames to be played" refers to the multiple image frames that will be played in the video layer of the playback area in response to the trigger operation of the adjustment controls. "Preset speed" refers to the playback speed of the image frames to be played. The preset speed can be a multiple of the video playback speed. The preset speed can be set according to actual needs or adjusted using the speed selection controls. For example, the preset speed can be 0.5 times the video playback speed.

[0094] For example, in response to a trigger operation on an adjustment control, the computer device acquires a preset number of image frames to be played based on the trigger command of the adjustment control and the image frames to be labeled, and then plays the image frames to be played in the video layer of the playback area at a preset speed; in response to a trigger operation on an editing control, the computer device acquires the currently playing image frame in the video layer, uses the currently playing image frame as the updated image frame to be labeled, and then acquires the playback time corresponding to the updated image frame to be labeled.

[0095] In one embodiment, the adjustment controls include a forward control and a backward control. In response to a trigger operation on the forward control, the computer device acquires a preset number of image frames before the image frame to be labeled, and the preset number of image frames before the image frame to be labeled are used as image frames to be played. If the computer device responds to a trigger operation on the backward control, it acquires a preset number of image frames after the image frame to be labeled, and the preset number of image frames after the image frame to be labeled are used as image frames to be played.

[0096] In this embodiment, by triggering the adjustment control, the video layer in the playback area plays the image frame to be played at a preset speed, which facilitates the selection of the target labeled image frame. Then, by triggering the editing control, the target labeled image frame is determined and used as the updated image frame to be labeled. The update of the image frame to be labeled and the labeling time can be completed by triggering the control and editing the control alone, which improves the update speed of the image frame to be labeled and the labeling time.

[0097] In one embodiment, such as Figure 4 As shown, in response to a trigger operation on the save control, the relative position, length, and height of the target annotation box relative to the image frame to be annotated are obtained, including:

[0098] Step 402: In response to the trigger operation on the save control, obtain the position coordinates of the four vertices of the image frame to be annotated.

[0099] Here, position coordinates refer to the coordinates used to characterize the position of the image frame to be labeled. Position coordinates can be two-dimensional coordinates.

[0100] For example, in response to a trigger operation on the save control, the computer device obtains the position coordinates of the four vertices of the image frame to be annotated.

[0101] In one embodiment, the computer device obtains the reference coordinates of the four vertices of the playback area and uses the reference coordinates as the position coordinates of the image frame to be labeled.

[0102] Step 404: Determine the position coordinates of the minimum coordinate value as the reference origin coordinates, take the position coordinates with the same horizontal axis coordinate value as the reference origin coordinates as the reference horizontal axis coordinates, and take the position coordinates with the same vertical axis coordinate value as the reference origin coordinates as the reference vertical axis coordinates.

[0103] For example, the computer device compares the position coordinates corresponding to the four vertices respectively, and determines the position coordinates whose horizontal and vertical coordinates are the smallest among the four position coordinates as the reference origin coordinates. Then, it compares the horizontal coordinates of the other three position coordinates with the horizontal coordinates of the reference origin coordinates, and determines the position coordinates with the same horizontal coordinate value as the reference origin coordinates as the reference horizontal coordinates. Then, it compares the vertical coordinates of the other three position coordinates with the vertical coordinates of the reference origin coordinates, and determines the position coordinates with the same vertical coordinate value as the reference vertical coordinates as the reference vertical coordinates.

[0104] Step 406: Establish a dimension coordinate system on the plane containing the dimension layer based on the reference origin coordinates, reference horizontal axis coordinates, and reference vertical axis coordinates.

[0105] The annotation coordinate system refers to the Cartesian coordinate system used to determine the position of the target annotation box.

[0106] For example, the computer device establishes a dimensioning coordinate system on the plane where the dimensioning layer is located, based on the coordinates of the origin, the reference horizontal axis coordinates, and the reference vertical axis coordinates.

[0107] In one embodiment, the computer device suggests a labeling coordinate system with the reference origin coordinates in the plane where the labeling layer is located as the origin, the horizontal axis of the labeling coordinate system passing through the reference horizontal axis coordinates, and the vertical axis of the labeling coordinate system passing through the reference vertical axis coordinates.

[0108] Step 408: Based on the annotation coordinate system, obtain the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box.

[0109] For example, the computer device obtains the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, based on the annotation coordinate system.

[0110] In one embodiment, the target annotation box is circular. The center coordinates of the target annotation box are obtained based on the annotation coordinate system. The center coordinates are used as the relative position of the target annotation box with respect to the image frame to be annotated. Then, the point coordinates of any point on the target annotation box are obtained, and the distance between the center coordinates and the point coordinates is calculated to obtain the size of the target annotation box.

[0111] In this embodiment, a labeling coordinate system is established on the plane where the labeling layer is located based on the position coordinates of the image frame to be labeled. The relative position of the target label box with respect to the image frame to be labeled, as well as the length and height of the target label box, are obtained based on the labeling coordinate system. The relative position, length and height of the target label box accurately represent the position of the target label box with respect to the image frame to be labeled, thereby improving the accuracy of video labeling.

[0112] In one embodiment, based on the annotation coordinate system, the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, are obtained, including:

[0113] Obtain the vertex coordinates of the four vertices of the target bounding box in the annotation coordinate system; determine the vertex coordinates with the smallest coordinate value as the relative position of the target bounding box with respect to the image frame to be annotated; based on the vertex coordinates, obtain the length and height of the target bounding box.

[0114] For example, the computer device obtains the vertex coordinates of the four vertices of the target annotation box in the annotation coordinate system, compares the four vertex coordinates, and determines the vertex coordinate with the smallest horizontal and vertical coordinates among the four vertex coordinates as the relative position. Then, it compares the horizontal coordinates of the other three vertices with the horizontal coordinates of the relative position, determines the position coordinate with the same horizontal coordinate value as the relative position as the first horizontal coordinate, and calculates the distance between the relative position and the first horizontal coordinate to obtain the length of the target annotation box. Then, it compares the vertical coordinates of the other three vertices with the vertical coordinates of the relative position, determines the vertex coordinate with the same vertical coordinate value as the relative position as the first vertical coordinate, and calculates the distance between the relative position and the first vertical coordinate to obtain the height of the target annotation box.

[0115] In one embodiment, the coordinates corresponding to the centroid of the target annotation box are determined based on the coordinates of the four vertices. The coordinates corresponding to the centroid are used as the relative position of the target annotation box with respect to the image frame to be annotated. Based on the vertex coordinates, the length and height of the target annotation box are obtained.

[0116] In this embodiment, the vertex coordinates of the four vertices of the target annotation box in the annotation coordinate system are obtained. Based on the four vertex coordinates, the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, are determined. The relative position, length and height of the target annotation box accurately represent the position of the target annotation box with respect to the image frame to be annotated, thereby improving the accuracy of video annotation.

[0117] In one embodiment, the video annotation method further includes:

[0118] In response to a trigger operation on the editing control, during the annotation of the image frame to be annotated, the playback control in the playback area and the progress bar corresponding to the video playback are locked.

[0119] The playback controls are the buttons used to control video playback and pause. The progress bar is a long, rectangular icon used to display the playback progress of the video. The progress bar can also display information such as the time the video has been playing and the total playback duration.

[0120] For example, in response to a trigger operation on the editing control, the computer device locks the playback control in the playback area and the progress bar corresponding to the video playback during the annotation of the image frame to be annotated.

[0121] In this embodiment, by locking the playback controls in the playback area and the progress bar corresponding to the video playback, video playback is prevented from being interrupted due to accidental operation.

[0122] In one exemplary embodiment, the web page player sets two layers using z-index: a video layer and an annotation layer. The annotation layer is stacked above the video layer. The video layer is used to play the video, and the annotation layer is used to annotate the video. The annotation layer encapsulates... <drag / > The drag-and-drop component acts as a draggable and resizable initial bounding box. Its properties can be determined through the component's API (Application Programming Interface), such as setting the initial bounding box's ID (Identity document), initial coordinates, initial size, whether dragging is disabled, mouse drag start, mouse drag end, and so on. The playback interface in a web player is as follows... Figure 5 As shown, the playback interface includes a playback area 502 and a labeling area 504.

[0123] In response to a trigger operation on the annotation controls in the annotation area, the current image frame of the playing video in the video layer is obtained, and this current image frame is used as the image frame to be annotated. Then, the playback time corresponding to the image frame to be annotated is obtained, and a new annotation entry is added to the annotation area, containing edit controls, save controls, forward controls, and backward controls, etc. Figure 6 As shown in 602. If the image frame to be labeled is the target image frame, then the image frame to be labeled is labeled. If the image frame to be labeled is not the target image frame, then in response to the trigger operation of the forward or backward control, based on the trigger instructions of the forward and backward controls and the image frame to be labeled, a preset number of image frames to be played are obtained. The image frames to be played are played in the video layer of the playback area at a preset speed. Then, in response to the trigger operation of the editing control, the playback control in the playback area and the progress bar corresponding to the video playback are locked. The currently playing image frame in the video layer is obtained and used as the updated image frame to be labeled. Then, the playback time corresponding to the updated image frame to be labeled is obtained. The initial label box is displayed in the label layer of the playback area. The adjustment instruction for adjusting the initial label box is obtained. The position and size of the initial label box are adjusted according to the adjustment instruction to obtain the target label box 604 corresponding to the image frame to be labeled.

[0124] In response to a trigger operation on the save control, the system obtains the position coordinates corresponding to the four vertices of the image frame to be annotated. These coordinates are compared, and the position coordinate with the smallest x-axis and y-axis coordinates is selected as the reference origin coordinate. Then, the x-axis coordinates of the other three positions are compared with the x-axis coordinate of the reference origin, and the position coordinate with the same x-axis value is selected as the reference x-axis coordinate. Similarly, the y-axis coordinates of the other three positions are compared with the y-axis coordinate of the reference origin, and the position coordinate with the same y-axis value is selected as the reference y-axis coordinate. A new annotation coordinate system is then established, with the reference origin coordinate in the plane containing the annotation layer as the origin. The x-axis and y-axis of this new system pass through the reference x-axis coordinate. Based on this system, the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, are obtained. Finally, the video playback and the annotation information corresponding to each playback moment are saved, resulting in the annotated target video.

[0125] In this embodiment, the playback time corresponding to the image frame to be labeled in the video being played in the video layer is obtained. The image frame to be labeled is then labeled in the labeling layer, obtaining the target label box corresponding to the image frame in the labeling layer, and the label information corresponding to the target label box. This label information is used as the label information corresponding to the playback time of the image frame to be labeled. Then, the playing video and the label information corresponding to the target label boxes at each playback time in the labeling layer are saved, resulting in the labeled target video. By labeling the image frame to be labeled in the labeling layer without modifying the image frame itself, whether labeling, modifying, or revoking the label, the image frame to be labeled will not be destroyed. This allows for rapid labeling of image frames according to actual needs, improving the efficiency of video labeling.

[0126] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0127] Based on the same inventive concept, this application also provides a video annotation device for implementing the video annotation method described above. The solution provided by this device is similar to the solution described in the above method; therefore, the specific limitations in one or more video annotation device embodiments provided below can be found in the limitations of the video annotation method described above, and will not be repeated here.

[0128] In one embodiment, such as Figure 7 As shown, a video annotation device is provided, including: an acquisition module 702, an annotation module 704, and a storage module 706, wherein:

[0129] The acquisition module 702 is used to acquire the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer;

[0130] The annotation module 704 is used to annotate the image frame to be annotated on the annotation layer to obtain the annotation information corresponding to the playback time; the annotation information is the information of the target annotation box corresponding to the image frame to be annotated, and the target annotation box is located on the annotation layer.

[0131] The saving module 706 is used to save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback moment, so as to obtain the annotated target video.

[0132] In one embodiment, the acquisition module 702 is further configured to: display a playback interface for playing a video, the playback interface including a playback area and an annotation area; in response to a trigger operation on an annotation control in the annotation area, pause the playback of the video layer in the playback area, acquire the image frame to be annotated in the video layer in the playback area, and the playback time corresponding to the image frame to be annotated.

[0133] In one embodiment, the annotation module 704 is further configured to: in response to a trigger operation on the annotation control in the annotation area, add a new annotation entry in the annotation area; the annotation entry includes an edit control and a save control; in response to a trigger operation on the edit control, display an initial annotation box in the annotation layer of the playback area; adjust the initial annotation box to obtain a target annotation box corresponding to the image frame to be annotated; in response to a trigger operation on the save control, obtain the relative position, length, and height of the target annotation box relative to the image frame to be annotated, and use the relative position, length, and height as the annotation information corresponding to the playback time.

[0134] In one embodiment, the acquisition module 702 is further configured to: if the image frame to be labeled is not the target labeled image frame, in response to a trigger operation on the adjustment control, acquire a preset number of image frames to be played; the preset number of image frames to be played is acquired based on the trigger command of the adjustment control and the image frame to be labeled; play the image frame to be played in the video layer of the playback area at a preset speed; in response to a trigger operation on the editing control, take the currently playing image frame in the video layer as the updated image frame to be labeled, and acquire the playback time corresponding to the updated image frame to be labeled.

[0135] In one embodiment, the annotation module 704 is further configured to: in response to a trigger operation on the save control, obtain the position coordinates corresponding to the four vertices of the image frame to be annotated; determine the position coordinates with the minimum coordinate value as the reference origin coordinates, use the position coordinates with the same horizontal axis coordinate value as the reference horizontal axis coordinates, and use the position coordinates with the same vertical axis coordinate value as the reference origin coordinates as the reference vertical axis coordinates; establish an annotation coordinate system on the plane where the annotation layer is located based on the reference origin coordinates, the reference horizontal axis coordinates, and the reference vertical axis coordinates; and obtain the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, based on the annotation coordinate system.

[0136] In one embodiment, the annotation module 704 is further configured to: obtain the vertex coordinates of the four vertices of the target annotation box in the annotation coordinate system; determine the vertex coordinates with the minimum coordinate value as the relative position of the target annotation box relative to the image frame to be annotated; and obtain the length and height of the target annotation box based on the vertex coordinates.

[0137] In one embodiment, the video annotation device further includes a locking module, which is configured to: lock the playback control in the playback area and the progress bar corresponding to the video playback in response to a trigger operation on the editing control during the annotation of the image frame to be annotated.

[0138] Each module in the aforementioned video annotation device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.

[0139] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 8As shown, the computer device includes a processor, memory, input / output interfaces, a communication interface, a display unit, and an input device. The processor, memory, and input / output interfaces are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interfaces are used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a video annotation method. The display unit is used to form a visually visible image and can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.

[0140] Those skilled in the art will understand that Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0141] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0142] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0143] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0144] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0145] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0146] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0147] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A video annotation method, characterized in that, The method includes: Obtain the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer; In response to a trigger operation on the annotation controls in the annotation area, a new annotation entry is added to the annotation area; the annotation entry includes an edit control and a save control; In response to a trigger operation on the editing control, an initial annotation box is displayed in the annotation layer of the playback area; the playback area refers to the area where the video is played. The initial annotation box is adjusted to obtain the target annotation box corresponding to the image frame to be annotated; In response to a trigger operation on the save control, the position coordinates corresponding to the four vertices of the image frame to be labeled are obtained; The position coordinates with the minimum coordinate value are used as the reference origin coordinates. The position coordinates with the same horizontal axis coordinate value as the reference origin coordinates are used as the reference horizontal axis coordinates. The position coordinates with the same vertical axis coordinate value as the reference origin coordinates are used as the reference vertical axis coordinates. Based on the reference origin coordinates, the reference horizontal axis coordinates, and the reference vertical axis coordinates, establish a labeling coordinate system on the plane where the labeling layer is located; Based on the labeled coordinate system, the relative position of the target labeled box with respect to the image frame to be labeled, as well as the length and height of the target labeled box, are obtained. The relative position, length, and height are used as the labeling information corresponding to the playback time. The labeling information includes position, length, height, labeling text information, and labeling icon. Save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback time to obtain the annotated target video.

2. The method according to claim 1, characterized in that, The step of obtaining the playback time corresponding to the image frame to be labeled in the playing video includes: The playback interface of the video is displayed, and the playback interface includes a playback area and a labeling area; In response to a trigger operation on the annotation control in the annotation area, the playback video of the video layer in the playback area is paused, and the image frame to be annotated in the video layer in the playback area, as well as the playback time corresponding to the image frame to be annotated, are obtained.

3. The method according to claim 1, characterized in that, The process of saving the playback video, and the annotation information corresponding to the target annotation box in the annotation layer at each playback moment, includes: The playback video is saved in a first folder, and the annotation information corresponding to the target annotation boxes in the annotation layer at each playback moment is saved in a second folder; the first folder corresponds to a first stacking order, which is used to characterize the stacking order of the playback video; the second folder corresponds to a second stacking order, which is used to characterize the stacking order of the target annotation boxes corresponding to each annotation information in the second folder.

4. The method according to claim 1, characterized in that, The annotation entries also include adjustment controls; obtaining the playback time corresponding to the image frame to be annotated in the playing video also includes: If the image frame to be labeled is not the target image frame to be labeled, then in response to the trigger operation of the adjustment control, a preset number of image frames to be played are obtained; the preset number of image frames to be played are obtained based on the trigger command of the adjustment control and the image frame to be labeled. The video layer in the playback area plays the image frames to be played at a preset speed. In response to a trigger operation on the editing control, the currently playing image frame in the video layer is used as the updated image frame to be labeled, and the playback time corresponding to the updated image frame to be labeled is obtained.

5. The method according to claim 1, characterized in that, The step of adjusting the initial annotation box to obtain the target annotation box corresponding to the image frame to be annotated includes: Obtain adjustment instructions to adjust the initial annotation box; The position and size of the initial annotation box are adjusted according to the adjustment instructions to obtain the target annotation box corresponding to the image frame to be annotated.

6. The method according to claim 1, characterized in that, The process of obtaining the relative position of the target annotation box with respect to the image frame to be annotated, and the length and height of the target annotation box based on the annotation coordinate system, includes: Obtain the vertex coordinates of the four vertices of the target bounding box in the annotation coordinate system; The vertex coordinates with the minimum coordinate value are determined as the relative position of the target annotation box with respect to the image frame to be annotated; Based on the vertex coordinates, the length and height of the target bounding box are obtained.

7. The method according to claim 1, characterized in that, The method further includes: In response to a trigger operation on the editing control, during the annotation of the image frame to be annotated, the playback control in the playback area and the progress bar corresponding to the playing video are locked.

8. A video annotation device, characterized in that, The device includes: The acquisition module is used to acquire the playback time corresponding to the image frame to be labeled in the playing video; the image frame to be labeled is located in the video layer; The annotation module is used to add a new annotation entry in the annotation area in response to a trigger operation on the annotation control in the annotation area; the annotation entry includes an edit control and a save control; in response to a trigger operation on the edit control, an initial annotation box is displayed in the annotation layer of the playback area; the playback area refers to the area where the video is played; the initial annotation box is adjusted to obtain the target annotation box corresponding to the image frame to be annotated; in response to a trigger operation on the save control, the position coordinates corresponding to the four vertices of the image frame to be annotated are obtained; the position coordinates with the smallest coordinate value are determined as the reference origin coordinates, the position coordinates with the same horizontal axis coordinate value as the reference origin coordinates are used as the reference horizontal axis coordinates, and the position coordinates with the same vertical axis coordinate value as the reference origin coordinates are used as the reference vertical axis coordinates; based on the reference origin coordinates, the reference horizontal axis coordinates, and the reference vertical axis coordinates, an annotation coordinate system is established on the plane where the annotation layer is located; based on the annotation coordinate system, the relative position of the target annotation box with respect to the image frame to be annotated, as well as the length and height of the target annotation box, are obtained, and the relative position, length, and height are used as the annotation information corresponding to the playback time; The saving module is used to save the playback video and the annotation information corresponding to the target annotation box in the annotation layer at each playback moment, so as to obtain the annotated target video.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.

11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.