Video turn-back recognition method and device based on multi-target tracking, equipment and medium
By performing multi-target tracking and direction counting on the video stream, and identifying back-and-forth shots in the video, the problem of not being able to accurately determine the number of target objects in existing technologies is solved, thus improving the accuracy of intelligent point counting.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2023-03-15
- Publication Date
- 2026-06-19
Smart Images

Figure CN116309721B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of video processing technology, and in particular to a video back-and-forth recognition method, apparatus, device and medium based on multi-target tracking. Background Technology
[0002] There are business scenarios requiring data counting across various industries. For example, in livestock insurance policies, it's necessary to count the animals being raised. Video tracking technology can quickly and accurately count animals on farms that require insuring or claims processing, effectively improving worker speed and reducing on-site workload. However, when workers use electronic devices to record videos, they need to move along the shooting direction to capture all target objects in the video. Inevitably, during this movement, there will be camera bounces in the opposite direction of the main movement direction, which is the shooting direction. For example, if the main movement direction is right, the worker might move left, resulting in left-right camera bounces; similarly, if the main movement direction is right, the worker might move left, resulting in up-down camera bounces. Because existing counting systems cannot determine the main movement direction of the worker holding the electronic device, the tracking module may have multiple tracking records for the same target object. This makes it impossible to directly determine the actual quantity of the target object based on the number of tracking records, hindering the accuracy of intelligent counting. Summary of the Invention
[0003] In view of the above problems, embodiments of this application provide a video back-and-forth recognition method, apparatus, device and medium based on multi-target tracking to solve the above technical problems.
[0004] In a first aspect, embodiments of this application provide a video backtracking recognition method based on multi-target tracking, including:
[0005] Multi-target tracking is performed on target objects in a first video stream to obtain a second video stream corresponding to the first video stream. The second video stream includes a tracking trajectory of at least one target object. The tracking trajectory includes at least one video frame containing the target object. Each video frame containing the target object includes a detection box corresponding to the target object.
[0006] The second video stream is counted in a first direction and counted in a second direction to obtain the first direction count result and the second direction count result for each video frame in the second video stream, wherein the first direction is opposite to the second direction;
[0007] Based on the first direction counting result and the second direction counting result, the motion direction of each video frame in the second video stream is obtained;
[0008] Based on the motion direction of each video frame in the second video stream, the backtracking recognition result of the second video stream is obtained.
[0009] Optionally, the step of performing multi-target tracking on the target object in the first video stream to obtain the second video stream corresponding to the first video stream includes:
[0010] The predicted bounding box of the target object in the current video frame of the first video stream is obtained based on the current tracking trajectory of the target object. The current tracking trajectory includes at least one video frame containing the target object, each video frame containing the target object includes a detection bounding box corresponding to the target object, and the video frame containing the target object is located before the current video frame.
[0011] Target object detection is performed on the current video frame to obtain at least one detection box of the current video frame, wherein detection boxes with a confidence level greater than or equal to a first threshold are assigned to a first detection box set, and detection boxes with a confidence level less than the first threshold are assigned to a second detection box set.
[0012] The predicted bounding box corresponding to the current tracking trajectory of each target object is matched with the first set of detection boxes to obtain the first matching result of the predicted bounding box corresponding to the current tracking trajectory of each target object. The detection box with the first matching result being a successful match is updated to the corresponding current tracking trajectory of the target object.
[0013] The predicted bounding boxes that fail to match in the first matching result are matched with the second set of detection boxes to obtain the second matching result of the predicted bounding boxes. The detection boxes that succeed in the second matching result are updated to the current tracking trajectory of the corresponding target object.
[0014] The detection boxes in the first set of detection boxes that fail to match the predicted box are taken as the current tracking trajectory of the newly added target object, and the detection boxes in the second set that fail to match the predicted box are deleted.
[0015] Optionally, the step of performing first direction counting and second direction counting on the second video stream to obtain the first direction counting result and the second direction counting result for each video frame in the second video stream includes:
[0016] If it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object that enters the video frame from the first edge, then the first direction count of the video frame is increased by a first number, which is the number of target objects that enter the video frame from the first edge, and the first edge corresponds to the first direction;
[0017] If it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object that leaves the video frame from the first edge, then the first direction count of the video frame is reduced by a second number, which is the number of target objects that leave the video frame from the first edge.
[0018] If it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object that enters the video frame from the second edge, then the second direction count of the video frame is increased by a third number, which is the number of target objects that enter the video frame from the second edge, and the second edge corresponds to the second direction;
[0019] If, based on the tracking trajectory of the at least one target object, it is determined that there is at least one target object leaving the video frame from the second edge, then the second direction count of the video frame is reduced by a fourth number, which is the number of target objects leaving the video frame from the second edge.
[0020] Optionally, obtaining the motion direction of each video frame in the second video stream based on the first direction counting result and the second direction counting result includes:
[0021] If the first direction count result of the current video frame is greater than the first direction count result of the previous video frame, or the second direction count result of the current video frame is less than the second direction count result of the previous video frame, then the motion direction of the current video frame is determined to be the first direction.
[0022] If the second direction count result of the current video frame is greater than the second direction count result of the previous video frame, or the first direction count result of the current video frame is less than the first direction count result of the previous video frame, then the motion direction of the current video frame is determined to be the second direction.
[0023] Optionally, obtaining the rewind recognition result of the second video stream based on the motion direction of each video frame in the second video stream includes:
[0024] The second video stream is divided into at least one video segment according to the motion direction of each video frame in the second video stream, and all video frames in each video segment have the same motion direction;
[0025] If the duration of any of the video segments is less than or equal to a preset time threshold, then the video segment is filtered out from the second video stream to obtain a filtered second video stream.
[0026] The number of directional changes in the second video stream is obtained based on the motion direction of each video segment in the filtered second video stream. If the number of directional changes is greater than or equal to a first preset threshold, the return detection result of the second video stream is determined to be a return shot.
[0027] Optionally, obtaining the rewind recognition result of the second video stream based on the motion direction of each video frame in the second video stream further includes:
[0028] For a video segment whose duration is less than or equal to a preset time threshold, a first count reduction value for the target direction count result corresponding to the video segment is obtained based on the difference between the first count value of the target direction count result of the last video frame and the first video frame in the video segment, wherein the target direction is opposite to the motion direction of the video segment;
[0029] If the first count decrease value is greater than or equal to a preset quantity threshold, then the return detection result of the second video stream is determined to be a return shot.
[0030] Optionally, obtaining the rewind recognition result of the second video stream based on the motion direction of each video frame in the second video stream further includes:
[0031] Obtain the total duration of the second video stream in the first direction and the total duration of the second video stream in the second direction;
[0032] Obtain the ratio of the total duration in the first direction to the total duration in the second direction;
[0033] If the ratio is greater than or equal to the first preset threshold and less than or equal to the second preset threshold, then the return detection result of the second video stream is determined to be a return shot.
[0034] Secondly, embodiments of this application provide a video back-and-forth recognition device based on multi-target tracking, comprising:
[0035] The tracking module is used to perform multi-target tracking on target objects in a first video stream to obtain a second video stream corresponding to the first video stream. The second video stream includes a tracking trajectory of at least one target object. The tracking trajectory includes at least one video frame containing the target object. Each video frame containing the target object includes a detection box corresponding to the target object.
[0036] The counting module is used to perform first direction counting and second direction counting on the second video stream respectively, to obtain the first direction counting result and the second direction counting result for each video frame in the second video stream, wherein the first direction is opposite to the second direction;
[0037] The direction determination module is used to obtain the motion direction of each video frame in the second video stream based on the first direction counting result and the second direction counting result;
[0038] The back-and-forth recognition module is used to obtain the back-and-forth recognition result of the second video stream based on the motion direction of each video frame in the second video stream.
[0039] Thirdly, embodiments of this application provide an electronic device, including a processor and a memory coupled to the processor, the memory storing program instructions executable by the processor; when the processor executes the program instructions stored in the memory, it implements the above-described video back-recognition method based on multi-target tracking.
[0040] Fourthly, embodiments of this application provide a storage medium storing program instructions, which, when executed by a processor, implement the aforementioned video back-and-forth recognition method based on multi-target tracking.
[0041] The video back-and-forth recognition method, apparatus, device, and medium based on multi-target tracking provided in this application embodiment perform multi-target tracking on target objects in a first video stream to obtain a second video stream corresponding to the first video stream; perform first direction counting and second direction counting on the second video stream respectively to obtain the first direction counting result and the second direction counting result for each video frame in the second video stream, wherein the first direction is opposite to the second direction; obtain the motion direction of each video frame in the second video stream based on the first direction counting result and the second direction counting result; obtain the back-and-forth recognition result of the second video stream based on the motion direction of each video frame. Through the above method, based on the multi-target tracking result, the first direction counting and the second direction counting are performed on each video frame respectively, and the motion direction of each video frame is determined according to the counting result. Without relying on other camera parameters and position coordinate information, the motion direction of each video frame can be determined, and then back-and-forth shooting recognition is performed based on the motion direction of each video frame, which is beneficial to improving the accuracy of intelligent point counting. Attached Figure Description
[0042] Figure 1 A flowchart illustrating a video back-and-forth recognition method based on multi-target tracking according to an embodiment of this application is shown.
[0043] Figure 2 A schematic diagram of a first direction and a second direction is shown in one embodiment of this application.
[0044] Figure 3 A schematic diagram of the structure of a video back-and-forth recognition device based on multi-target tracking provided in an embodiment of this application is shown.
[0045] Figure 4 A schematic diagram of the structure of an electronic device provided in an embodiment of this application is shown.
[0046] Figure 5 A schematic diagram of the structure of a storage medium provided in an embodiment of this application is shown. Detailed Implementation
[0047] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.
[0048] To enable those skilled in the art to better understand the solutions of this application, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0049] In the embodiments of this application, it should be noted that, in this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations.
[0050] Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0051] In the description of the embodiments of this application, the words "example" or "for example" are used to indicate exemplification, illustration, or description. Any embodiment or design described as "example" or "for example" in the embodiments of this application is not to be construed as being more preferred or having more advantages than another embodiment or design. The use of the words "example" or "for example" is intended to present relative concepts in a clear manner.
[0052] Furthermore, in the embodiments of this application, "multiple" refers to two or more. Therefore, in the embodiments of this application, "multiple" can also be understood as "at least two". "At least one" can be understood as one or more, such as one, two, or more. For example, including at least one means including one, two, or more, and is not limited to which ones are included. For example, including at least one of A, B, and C, then it could include A, B, C, A and B, A and C, B and C, or A and B and C.
[0053] It should be noted that in the embodiments of this application, "and / or" describes the relationship between associated objects, indicating that there can be three relationships. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. In addition, the character " / ", unless otherwise specified, generally indicates that the associated objects before and after it are in an "or" relationship.
[0054] One embodiment of this application provides a video back-and-forth recognition method based on multi-target tracking. The executing entity of the video back-and-forth recognition method based on multi-target tracking includes, but is not limited to, at least one of the following electronic devices: a server, a terminal, or any other electronic device that can be configured to execute the video back-and-forth recognition method based on multi-target tracking provided in this application embodiment. In other words, the video back-and-forth recognition method based on multi-target tracking can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to, a single server, a server cluster, a cloud server, or a cloud server cluster.
[0055] One embodiment of this application provides a video back-and-forth recognition method based on multi-target tracking. Please refer to [link / reference]. Figure 1 As shown, the video back-and-forth recognition method based on multi-target tracking includes:
[0056] S11, perform multi-target tracking on the target object in the first video stream to obtain a second video stream corresponding to the first video stream, wherein the second video stream includes a tracking trajectory of at least one target object, the tracking trajectory includes at least one video frame containing the target object, and each video frame containing the target object includes a detection box corresponding to the target object.
[0057] The first video stream can be a video to be identified captured by the user, and the first video stream includes multiple consecutive video frames captured by the user's handheld electronic device; or, the first video stream can include any few video frames from multiple consecutive video frames captured by the user's handheld electronic device. The input video to be identified is frame-sampling at a pre-set frame-sampling interval to reduce the computation speed of subsequent tracking and inference, and at the same time facilitates a more lightweight deployment on the user end.
[0058] The purpose of multi-object tracking is to identify the bounding boxes and identities of various target objects in a video. These identities can be represented by unique identifiers. First, target object detection is performed on each video frame of the first video stream. Target objects are selected within each frame using detection boxes. Then, the target objects are tracked to obtain their tracking trajectories. Specifically, the detection box data (x, y, w, h, score, aid) for each detection box is recorded according to a preset data structure. Here, w and h represent the center coordinates of the target object's detection box, x and y represent the width and height of the detection box, respectively, score represents the confidence score of the predicted box, and aid represents the unique identifier of the target object corresponding to the detection box.
[0059] In the process of tracking a target object, a prediction bounding box of the target object is predicted based on the current tracking trajectory of the target object. The prediction bounding box is then matched with each detection box in the current video frame. The AID of the detection box that matches the prediction bounding box is recorded as a unique identifier of the target object. This identifier can be the tracking trajectory code.
[0060] The second video stream is obtained by processing the first video stream through multi-target tracking. The second video stream adds the detection box of the target object to each video frame of the first video stream. In other words, each video frame in the second video stream displays the detection box of the target object based on the corresponding video frame in the first video stream.
[0061] As one implementation method, step S11 specifically includes the following steps:
[0062] S111, obtain the prediction box of the target object in the current video frame of the first video stream according to the current tracking trajectory of the target object, wherein the current tracking trajectory includes at least one video frame containing the target object, each video frame containing the target object includes a detection box corresponding to the target object, and the video frame containing the target object is located before the current video frame.
[0063] The current tracking trajectory of the target object is obtained by tracking all video frames preceding the current video frame. These preceding frames are referred to as historical frames. The Kalman filter algorithm is used to update the historical frame tracking information to obtain predicted bounding boxes. The historical frame tracking information includes the detection box data of the target object in at least one historical frame, and the predicted bounding box data (x, y, w, h), where w and h represent the center coordinates of the predicted bounding box of the target object, and x and y represent the width and height of the predicted bounding box of the target object, respectively. All current tracking trajectories can be placed into a tracking set.
[0064] S112, Target object detection is performed on the current video frame to obtain at least one detection box of the current video frame, wherein the detection boxes in the current video frame with a confidence level greater than or equal to a first threshold are assigned to a first detection box set, and the detection boxes in the current video frame with a confidence level less than the first threshold are assigned to a second detection box set.
[0065] In this video frame, the detection boxes have not yet been used for target object tracking, and the aid data of each detection box in the current video frame is temporarily empty. Detection boxes with low confidence may be those where the selected target object is occluded, or those where the selected object is not a target object. If detection boxes with low confidence are directly filtered out, the predicted box may not be able to match the detection box. Therefore, this implementation retains all detection boxes, and detection boxes with confidence greater than or equal to a first threshold are placed into the first detection box set as high-confidence detection boxes, and detection boxes with confidence less than the first threshold are placed into the second detection box set as low-confidence detection boxes.
[0066] S113, match the prediction box corresponding to the current tracking trajectory of each target object with the first detection box set to obtain the first matching result of the prediction box corresponding to the current tracking trajectory of each target object, and update the detection box with the first matching result to the current tracking trajectory of the corresponding target object.
[0067] First, the predicted bounding box is matched with the detection bounding boxes with higher confidence in the first detection set. If the match is successful, the aid in the detection bounding box data of the detection bounding box that matches the predicted bounding box is updated to the tracking trajectory code of the corresponding target object. Then, the detection bounding box data of the detection bounding box that matches the predicted bounding box is added to the current tracking trajectory of the target object corresponding to the predicted bounding box.
[0068] S114, the predicted box with the first matching result of a failed match is matched with the second set of detection boxes to obtain the second matching result of the predicted box, and the detection box with a successful match is updated to the current tracking trajectory of the corresponding target object;
[0069] Specifically, when the predicted bounding box cannot be matched with a detection box in the first detection box set, the predicted bounding box is matched with a detection box with a lower confidence in the second detection set. If the match is successful, the aid in the detection box data of the detection box that matches the predicted bounding box is updated to the tracking trajectory code of the corresponding target object, and the detection box data of the detection box that matches the predicted bounding box is added to the current tracking trajectory of the target object corresponding to the predicted bounding box.
[0070] S115, the detection boxes in the first detection box set that fail to match the prediction box are taken as the current tracking trajectory of the newly added target object, and the detection boxes in the second detection box set that fail to match the prediction box are deleted;
[0071] Once all predicted bounding boxes have been matched with each of the detected bounding boxes in the first set, the remaining detected bounding boxes in the first set cannot be matched with the predicted bounding boxes of all current tracking trajectories in the tracking set. This indicates that the target object selected by the remaining detected bounding boxes in the first set is appearing for the first time in the first video stream. The newly added target object is given a unique identity identifier, namely the tracking trajectory code. The AID in the detected bounding box data is updated to the tracking trajectory code of the corresponding target object, and the current tracking trajectory of the newly added target object is put into the tracking set.
[0072] Once all predicted bounding boxes that failed to match the first set of detection boxes have been matched with each of the detection boxes in the second set of detection boxes, the remaining detection boxes in the second set of detection boxes cannot match any of the predicted bounding boxes of the current tracking trajectory in the tracking set. This indicates that the objects selected by the remaining detection boxes in the second set of detection boxes are not the target objects, and the remaining detection boxes in the second set of detection boxes can be deleted.
[0073] As one implementation method, the current tracking trajectory corresponding to the predicted box that fails to match the first detection box set and fails to match the second detection box set can be removed from the tracking set and placed into the lost set.
[0074] S12, perform first direction counting and second direction counting on the second video stream respectively to obtain the first direction counting result and second direction counting result for each video frame in the second video stream, wherein the first direction is opposite to the second direction;
[0075] In this system, one of the first and second directions is the main direction of movement (shooting direction), and the other is the turning direction. A first-direction counter and a second-direction counter can be set separately. The first-direction counter counts target objects entering the frame from the first edge corresponding to the first direction and target objects leaving the frame from the first edge corresponding to the first direction. The counting rule for the first-direction counter is: if a target object enters the frame from the first edge, the counter counts by 1; if a target object leaves the frame from the first edge, the counter counts by 1. The second-direction counter counts target objects entering the frame from the second edge corresponding to the second direction and target objects leaving the frame from the second edge corresponding to the second direction. The counting rule for the second-direction counter is: if a target object enters the frame from the second edge, the counter counts by 1; if a target object leaves the frame from the second edge, the counter counts by 1.
[0076] As one implementation method, step S12 specifically includes the following steps:
[0077] S121, if it is determined from the tracking trajectory of the at least one target object that the video frame has at least one target object entering the video frame from the first edge, then the first direction count of the video frame is increased by a first number, the first number being the number of target objects entering the video frame from the first edge, and the first edge corresponding to the first direction;
[0078] S122, if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object that leaves the video frame from the first edge, then the first direction count of the video frame is reduced by a second number, the second number being the number of target objects that leave the video frame from the first edge;
[0079] S123, if it is determined from the tracking trajectory of the at least one target object that the video frame has at least one target object entering the video frame from the second edge, then the second direction count of the video frame is increased by a third number, the third number being the number of target objects entering the video frame from the second edge, the second edge corresponding to the second direction;
[0080] S124, if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object leaving the video frame from the second edge, then the second direction count of the video frame is reduced by a fourth number, which is the number of target objects leaving the video frame from the second edge.
[0081] The current video frame continues counting based on the first and second direction counts of the previous video frame. Please refer to [link / reference]. Figure 2As shown, when the shooting direction is the first direction, the target object will enter the frame from the first edge and eventually leave the frame from the second edge; when the shooting direction is the second direction, the target object will enter the frame from the second edge and eventually leave the frame from the first edge. For example, the first direction is the left direction and the second direction is the right direction.
[0082] S13, based on the first direction counting result and the second direction counting result, obtain the motion direction of each video frame in the second video stream;
[0083] In this context, the motion direction of a video frame corresponds to its shooting direction. When the shooting direction is the first direction, the target object enters the frame from the first edge and eventually leaves the frame from the second edge. When the shooting direction is the second direction, the target object enters the frame from the second edge and eventually leaves the frame from the first edge. For example, the first direction might be left, and the second direction might be right. Therefore, when the shooting direction is the first direction, the first direction count of the current video frame is not less than that of the previous video frame, and the second direction count of the current video frame is not greater than that of the previous video frame. When the shooting direction is the second direction, the second direction count of the current video frame is not less than that of the previous video frame, and the first direction count of the current video frame is not greater than that of the previous video frame.
[0084] As one implementation method, step S13 specifically includes the following steps:
[0085] S131, if the first direction count result of the current video frame is greater than the first direction count result of the previous video frame or the second direction count result of the current video frame is less than the second direction count result of the previous video frame, then the motion direction of the current video frame is determined to be the first direction.
[0086] S132, if the second direction count result of the current video frame is greater than the second direction count result of the previous video frame or the first direction count result of the current video frame is less than the first direction count result of the previous video frame, then the motion direction of the current video frame is determined to be the second direction.
[0087] Specifically, the motion direction of each video frame is determined as either a first direction or a second direction.
[0088] S14, Based on the motion direction of each video frame in the second video stream, obtain the backtracking recognition result of the second video stream;
[0089] In cases of reversal shooting, the second video stream will exhibit a change in direction. For example, frames 1 to N1 might represent the first direction, while frames N1+1 to N1+N2 might represent the second direction. The motion directions of the first N1 frames and the last N2 frames are opposite, indicating a possible change in direction. However, actual analysis of real-world business video footage revealed that many factors can cause changes in motion direction. The challenge lies in distinguishing between slight shaking and genuine reversal. Slight shaking, caused by human error or accident, alters the motion direction but does not affect subsequent counting. In contrast, the change in motion direction during a genuine reversal ultimately impacts the subsequent counting process.
[0090] In the first application scenario of turnaround recognition, turnaround recognition is achieved by setting the duration of the motion direction and the number of changes in the motion direction. As one implementation method, step S14 specifically includes the following steps:
[0091] S141, the second video stream is divided into at least one video segment according to the motion direction of each video frame in the second video stream, and all video frames in each video segment have the same motion direction;
[0092] In this method, several consecutive video frames with the same direction of motion are considered as a video segment. The second video stream is divided into several video segments in chronological order, with adjacent video segments moving in opposite directions. For example, the first video segment is from frame 1 to frame N1, the second video segment is from frame N1+1 to frame N1+N2, the third video segment is from frame N1+N2+1 to frame N1+N2+N3, and the fourth video segment is from frame 1 to frame N1, with the first direction of motion. The second video segment is from frame N1+1 to frame N1+N2, with the second direction of motion. The third video segment is from frame N1+N2+1 to frame N1+N2+N3, with the first direction of motion.
[0093] S142, if the duration of any of the video segments is less than or equal to a preset time threshold, then the video segment is filtered from the second video stream to obtain a filtered second video stream;
[0094] If the duration of a video segment is less than or equal to a preset time threshold, it indicates that the video segment may be caused by slight shaking. The preset time threshold is generally set based on experience. After filtering out video segments with shorter durations, these segments will not be included in the statistics of the number of directional changes in step S143, so as not to affect the reversal recognition.
[0095] S143, based on the motion direction of each video segment in the filtered second video stream, obtain the number of direction changes in the second video stream. If the number of direction changes is greater than or equal to a first preset threshold, then determine that the return recognition result of the second video stream is that a return shooting has occurred.
[0096] The number of direction changes is determined based on the motion direction of each video segment. For example, after filtering, the second video stream includes a first video segment, a second video segment, and a third video segment. The first video segment is in the first direction, the second video segment is in the first direction, and the third video segment is in the second direction, with a direction change count of 1. Alternatively, the second video stream may include the same three video segments, with the first video segment in the first direction, the second video segment in the second direction, and the third video segment in the first direction, resulting in a direction change count of 2. The first preset threshold is set based on experience; for example, it can be set to 1, meaning any direction change is considered a backtracking shot. Considering that backtracking is likely to occur at the beginning or end of video recording, the first preset threshold can be set to 2 or 3 to include backtracking at the beginning and / or end, thus avoiding the need for staff to repeatedly record the video to be identified.
[0097] In the second application scenario of turnback recognition, based on the first application scenario, the degree of turnback is identified by analyzing video segments with short durations in the direction of motion. As one implementation method, step S14 further includes the following steps:
[0098] S144, for a video segment whose duration is less than or equal to a preset time threshold, obtain a first count reduction value of the target direction count result corresponding to the video segment based on the difference between the first count value of the target direction count result of the last video frame and the first video frame in the video segment, wherein the target direction is opposite to the motion direction of the video segment;
[0099] In particular, when the person holding the electronic device moves quickly, the electronic device may move a large distance in a short period of time. If this happens, the electronic device moves a large distance in a short period of time, which causes a large change in the counting result of the target object in the video segment, resulting in a real backlash. However, the video segment may be misidentified as slight shaking. Therefore, for video segments with short duration, it is necessary to identify the degree of backlash.
[0100] In this scenario, assuming the video segment represents a true turnaround, the main direction of movement is opposite to the movement direction of the video segment, and the target direction may be the main direction of movement. The degree of count reduction in the target direction is determined to identify the degree of turnaround. If the movement direction of the video segment is the first direction, then the target direction is the second direction. The first count reduction value for the target direction corresponding to the video segment is obtained based on the difference between the first count values of the second direction count results of the last video frame and the first video frame in the video segment. Conversely, if the movement direction of the video segment is the second direction, then the target direction is the first direction. The first count reduction value for the target direction count result is obtained based on the difference between the first count values of the first direction count results of the last video frame and the first video frame in the video segment.
[0101] S145, if the first count decrease value is greater than or equal to a preset quantity threshold, then the return recognition result of the second video stream is determined to be a return shooting.
[0102] The preset quantity threshold is set based on experience.
[0103] In the third application scenario of zigzag recognition, considering that in actual shooting, there may be zigzag situations with few changes but large durations, we compare the duration of the first direction and the second direction in the whole video. If the ratio meets the upper and lower threshold conditions, it means that the duration of the first direction and the second direction in the video are similar, which is identified as a typical large-amplitude, low-frequency zigzag.
[0104] In one implementation, step S14 further includes the following steps:
[0105] S146, Obtain the total duration of the second video stream in the first direction and the total duration of the second video stream in the second direction;
[0106] S147, obtain the ratio of the total duration in the first direction to the total duration in the second direction;
[0107] S148, if the ratio is greater than or equal to the first preset threshold and less than or equal to the second preset threshold, then the return detection result of the second video stream is determined to be a return shooting.
[0108] The third application scenario, based on the first application scenario, judges the ratio of the total duration of the two directions, which can identify backtracking shots with fewer directional changes but a larger degree of backtracking.
[0109] Figure 3 This is a schematic diagram of the structure of a video back-and-forth recognition device based on multi-target tracking according to an embodiment of this application. Figure 3As shown, the video back-and-forth recognition device 30 based on multi-target tracking includes: a tracking module 31, a counting module 32, a direction determination module 33, and a back-and-forth recognition module 34. The tracking module 31 is used to perform multi-target tracking on target objects in a first video stream to obtain a second video stream corresponding to the first video stream. The second video stream includes a tracking trajectory of at least one target object, and the tracking trajectory includes at least one video frame containing the target object. Each video frame containing the target object includes a detection box corresponding to the target object. The counting module 32 is used to perform first direction counting and second direction counting on the second video stream to obtain a first direction counting result and a second direction counting result for each video frame in the second video stream, wherein the first direction is opposite to the second direction. The direction determination module 33 is used to obtain the motion direction of each video frame in the second video stream based on the first direction counting result and the second direction counting result. The back-and-forth recognition module 34 is used to obtain the back-and-forth recognition result of the second video stream based on the motion direction of each video frame in the second video stream.
[0110] In one implementation, the tracking module 31 is further configured to: obtain a predicted bounding box of the target object in the current video frame of the first video stream based on the current tracking trajectory of the target object, wherein the current tracking trajectory includes at least one video frame containing the target object, each video frame containing the target object includes a detection bounding box corresponding to the target object, and the video frame containing the target object is located before the current video frame; perform target object detection on the current video frame to obtain at least one detection bounding box of the current video frame, wherein detection bounding boxes in the current video frame with a confidence level greater than or equal to a first threshold are assigned to a first detection bounding box set, and detection bounding boxes in the current video frame with a confidence level less than the first threshold are assigned to a second detection bounding box set. The system involves: matching the predicted bounding box corresponding to the current tracking trajectory of each target object with the first set of detection boxes to obtain a first matching result for the predicted bounding box corresponding to the current tracking trajectory of each target object; updating the corresponding target object's current tracking trajectory with a successful first matching result; matching the predicted bounding boxes with a failed first matching result with the second set of detection boxes to obtain a second matching result for the predicted bounding box; updating the corresponding target object's current tracking trajectory with a successful second matching result; and deleting the detection boxes in the first set that failed to match the predicted bounding box as the current tracking trajectory of newly added target objects.
[0111] In one implementation, the counting module 32 is further configured to: if, based on the tracking trajectory of the at least one target object, the video frame contains at least one target object entering the video frame from the first edge, then the first direction count of the video frame increases by a first amount, the first amount being the number of target objects entering the video frame from the first edge, the first edge corresponding to the first direction; if, based on the tracking trajectory of the at least one target object, the video frame contains at least one target object leaving the video frame from the first edge, then the first direction count of the video frame decreases by a second amount, the second amount being the number of target objects leaving the video frame from the first edge; if, based on the tracking trajectory of the at least one target object, the video frame contains at least one target object entering the video frame from the second edge, then the second direction count of the video frame increases by a third amount, the third amount being the number of target objects entering the video frame from the second edge, the second edge corresponding to the second direction; if, based on the tracking trajectory of the at least one target object, the video frame contains at least one target object leaving the video frame from the second edge, then the second direction count of the video frame decreases by a fourth amount, the fourth amount being the number of target objects leaving the video frame from the second edge.
[0112] In one implementation, the direction determination module 33 is further configured to: determine the motion direction of the current video frame as the first direction if the first direction count result of the current video frame is greater than the first direction count result of the previous video frame or the second direction count result of the current video frame is less than the second direction count result of the previous video frame; and determine the motion direction of the current video frame as the second direction if the second direction count result of the current video frame is greater than the second direction count result of the previous video frame or the first direction count result of the current video frame is less than the first direction count result of the previous video frame.
[0113] In one implementation, the backflip recognition module 34 is further configured to: divide the second video stream into at least one video segment according to the motion direction of each video frame in the second video stream, wherein all video frames in each video segment have the same motion direction; if the duration of any video segment is less than or equal to a preset time threshold, then filter the video segment from the second video stream to obtain a filtered second video stream; obtain the number of direction changes of the second video stream according to the motion direction of each video segment in the filtered second video stream, and if the number of direction changes is greater than or equal to a first preset number threshold, then determine that the backflip recognition result of the second video stream is that backflip shooting has occurred.
[0114] In one implementation, the turnaround recognition module 34 is further configured to: for video segments with a duration less than or equal to a preset time threshold, obtain a first count reduction value for the target direction count result corresponding to the video segment based on the difference between the first count value of the target direction count result of the last video frame and the first video frame in the video segment, wherein the target direction is opposite to the movement direction of the video segment; if the first count reduction value is greater than or equal to a preset quantity threshold, then determine that the turnaround recognition result of the second video stream is that a turnaround shooting has occurred.
[0115] In one implementation, the back-and-forth recognition module 34 is further configured to: obtain the total duration of the second video stream in the first direction and the total duration of the second video stream in the second direction; obtain the ratio of the total duration in the first direction to the total duration in the second direction; if the ratio is greater than or equal to a first preset threshold and less than or equal to a second preset threshold, then determine that the back-and-forth recognition result of the second video stream is that back-and-forth shooting has occurred.
[0116] Figure 4 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Figure 4 As shown, the electronic device 40 includes a processor 41 and a memory 42 coupled to the processor 41.
[0117] The memory 42 stores program instructions for implementing the video back-and-forth recognition method based on multi-target tracking according to any of the above embodiments.
[0118] The processor 41 is used to execute program instructions stored in the memory 32 to perform video rewind recognition based on multi-target tracking.
[0119] The processor 41 can also be referred to as a CPU (Central Processing Unit). The processor 41 may be an integrated circuit chip with signal processing capabilities. The processor 41 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor.
[0120] See Figure 5 , Figure 5This is a schematic diagram of the structure of a storage medium according to an embodiment of this application. The storage medium of this embodiment stores program instructions 51 capable of implementing all the methods described above. The storage medium can be non-volatile or volatile. The program instructions 51 can be stored in the storage medium in the form of a software product, including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks, or terminal devices such as computers, servers, mobile phones, and tablets.
[0121] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0122] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units. The above are merely embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made based on the description and drawings of this application, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
[0123] The above description is merely an embodiment of this application. It should be noted that those skilled in the art can make improvements without departing from the inventive concept of this application, but these improvements all fall within the protection scope of this application.
Claims
1. A method for video lap identification based on multi-target tracking, characterized in that, include: Multi-target tracking is performed on target objects in a first video stream to obtain a second video stream corresponding to the first video stream. The second video stream includes a tracking trajectory of at least one target object. The tracking trajectory includes at least one video frame containing the target object. Each video frame containing the target object includes a detection box corresponding to the target object. The second video stream is counted in a first direction and in a second direction to obtain the first direction count result and the second direction count result for each video frame in the second video stream, wherein the first direction is opposite to the second direction; specifically, if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object entering the video frame from the first edge, then the first direction count of the video frame is increased by a first number, the first number being the number of target objects entering the video frame from the first edge, the first edge corresponding to the first direction; if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object leaving the video frame from the first edge, then the first direction count of the video frame is increased by a first number, the first number being the number of target objects entering the video frame from the first edge, the first direction count being... The first direction count of the video frame is decreased by a second quantity, which is the number of target objects leaving the video frame from the first edge. If, based on the tracking trajectory of the at least one target object, it is determined that the video frame contains at least one target object entering the video frame from the second edge, then the second direction count of the video frame is increased by a third quantity, which is the number of target objects entering the video frame from the second edge, where the second edge corresponds to the second direction. If, based on the tracking trajectory of the at least one target object, it is determined that the video frame contains at least one target object leaving the video frame from the second edge, then the second direction count of the video frame is decreased by a fourth quantity, which is the number of target objects leaving the video frame from the second edge. Based on the first direction counting result and the second direction counting result, the motion direction of each video frame in the second video stream is obtained; specifically, if the first direction counting result of the current video frame is greater than the first direction counting result of the previous video frame or the second direction counting result of the current video frame is less than the second direction counting result of the previous video frame, then the motion direction of the current video frame is determined to be the first direction; if the second direction counting result of the current video frame is greater than the second direction counting result of the previous video frame or the first direction counting result of the current video frame is less than the first direction counting result of the previous video frame, then the motion direction of the current video frame is determined to be the second direction. Based on the motion direction of each video frame in the second video stream, the backtracking recognition result of the second video stream is obtained. 2.The multi-target tracking based video lap identification method of claim 1, wherein, The step of performing multi-target tracking on target objects in the first video stream to obtain a second video stream corresponding to the first video stream includes: The predicted bounding box of the target object in the current video frame of the first video stream is obtained based on the current tracking trajectory of the target object. The current tracking trajectory includes at least one video frame containing the target object, each video frame containing the target object includes a detection bounding box corresponding to the target object, and the video frame containing the target object is located before the current video frame. Target object detection is performed on the current video frame to obtain at least one detection box of the current video frame, wherein detection boxes with a confidence level greater than or equal to a first threshold are assigned to a first detection box set, and detection boxes with a confidence level less than the first threshold are assigned to a second detection box set. The predicted bounding box corresponding to the current tracking trajectory of each target object is matched with the first set of detection boxes to obtain the first matching result of the predicted bounding box corresponding to the current tracking trajectory of each target object. The detection box with the first matching result being a successful match is updated to the corresponding current tracking trajectory of the target object. The predicted bounding boxes that fail to match in the first matching result are matched with the second set of detection boxes to obtain the second matching result of the predicted bounding boxes. The detection boxes that succeed in the second matching result are updated to the current tracking trajectory of the corresponding target object. The detection boxes in the first set of detection boxes that fail to match the predicted box are taken as the current tracking trajectory of the newly added target object, and the detection boxes in the second set that fail to match the predicted box are deleted.
3. The video back-and-forth recognition method based on multi-target tracking according to claim 1, characterized in that, The step of obtaining the backtracking recognition result of the second video stream based on the motion direction of each video frame in the second video stream includes: The second video stream is divided into at least one video segment according to the motion direction of each video frame in the second video stream, and all video frames in each video segment have the same motion direction; If the duration of any of the video segments is less than or equal to a preset time threshold, then the video segment is filtered out from the second video stream to obtain a filtered second video stream. The number of directional changes in the second video stream is obtained based on the motion direction of each video segment in the filtered second video stream. If the number of directional changes is greater than or equal to a first preset threshold, the return detection result of the second video stream is determined to be a return shot.
4. The video back-and-forth recognition method based on multi-target tracking according to claim 3, characterized in that, The step of obtaining the rewind recognition result of the second video stream based on the motion direction of each video frame in the second video stream further includes: For a video segment whose duration is less than or equal to a preset time threshold, a first count reduction value for the target direction count result corresponding to the video segment is obtained based on the difference between the first count value of the target direction count result of the last video frame and the first video frame in the video segment, wherein the target direction is opposite to the motion direction of the video segment; If the first count decrease value is greater than or equal to a preset quantity threshold, then the return detection result of the second video stream is determined to be a return shot. 5.The multi-target tracking based video lap identification method of claim 4, wherein, The step of obtaining the rewind recognition result of the second video stream based on the motion direction of each video frame in the second video stream further includes: Obtain the total duration of the second video stream in the first direction and the total duration of the second video stream in the second direction; Obtain the ratio of the total duration in the first direction to the total duration in the second direction; If the ratio is greater than or equal to the first preset threshold and less than or equal to the second preset threshold, then the return detection result of the second video stream is determined to be a return shot.
6. A video back-and-forth recognition device based on multi-target tracking, characterized in that, include: The tracking module is used to perform multi-target tracking on target objects in a first video stream to obtain a second video stream corresponding to the first video stream. The second video stream includes a tracking trajectory of at least one target object. The tracking trajectory includes at least one video frame containing the target object. Each video frame containing the target object includes a detection box corresponding to the target object. The counting module is configured to perform first-direction counting and second-direction counting on the second video stream respectively, to obtain the first-direction counting result and the second-direction counting result for each video frame in the second video stream, wherein the first direction is opposite to the second direction; the counting module is further configured to, if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object entering the video frame from the first edge, then the first-direction count of the video frame is increased by a first number, the first number being the number of target objects entering the video frame from the first edge, the first edge corresponding to the first direction; if it is determined from the tracking trajectory of the at least one target object that the video frame contains at least one target object leaving the video frame from the first edge... If, based on the tracking trajectory of at least one target object, the video frame contains at least one target object entering the video frame from the second edge, then the second direction count of the video frame increases by a third number, which is the number of target objects entering the video frame from the second edge, where the second edge corresponds to the second direction. If, based on the tracking trajectory of at least one target object, the video frame contains at least one target object leaving the video frame from the second edge, then the second direction count of the video frame decreases by a fourth number, which is the number of target objects leaving the video frame from the second edge. The direction determination module is used to obtain the motion direction of each video frame in the second video stream based on the first direction counting result and the second direction counting result; the direction determination module is further used to determine the motion direction of the current video frame as the first direction if the first direction counting result of the current video frame is greater than the first direction counting result of the previous video frame or the second direction counting result of the current video frame is less than the second direction counting result of the previous video frame; and to determine the motion direction of the current video frame as the second direction if the second direction counting result of the current video frame is greater than the second direction counting result of the previous video frame or the first direction counting result of the current video frame is less than the first direction counting result of the previous video frame. The back-and-forth recognition module is used to obtain the back-and-forth recognition result of the second video stream based on the motion direction of each video frame in the second video stream.
7. An electronic device, characterized in that, The method includes a processor and a memory coupled to the processor, the memory storing program instructions executable by the processor; when the processor executes the program instructions stored in the memory, it implements the video back-and-forth recognition method based on multi-target tracking as described in any one of claims 1 to 5.
8. A storage medium, characterized by The storage medium stores program instructions, which, when executed by a processor, enable the video back-and-forth recognition method based on multi-target tracking as described in any one of claims 1 to 5.