A target tracking method and device based on similar semantic interferer assistance
By acquiring the target response map and the relative position prediction of the Kalman motion model cluster, the problems of accuracy and real-time performance in target tracking under complex scenarios are solved, and efficient tracking under similar semantic interference is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING INST OF ENVIRONMENTAL FEATURES
- Filing Date
- 2024-03-28
- Publication Date
- 2026-06-19
AI Technical Summary
Existing target tracking methods are susceptible to similar semantic interference in complex scenarios, making accurate tracking difficult, and it is hard to balance real-time performance and robustness.
By acquiring the target response map, extracting the peak response score to determine the candidate target set, using a cluster of Kalman motion models to predict the relative positional relationship, combining similarity matching to determine the tracking target, and performing Kalman filtering to update and suppress the influence of similar semantic interference.
This improves the tracking performance of target tracking methods in complex scenarios, achieves a balance between real-time performance and robustness, and reduces computational complexity.
Smart Images

Figure CN118096831B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of target tracking technology, and in particular to a target tracking method and apparatus based on similar semantic interference. Background Technology
[0002] Object tracking is a traditional computer vision problem with important applications in many fields. Generally, object tracking tasks require the tracker to predict the trajectory of the target over a period of time within an entire image sequence, given the target's initial position.
[0003] Currently, most target tracking methods typically rely solely on constructed appearance models to locate the target position in each frame. For some complex scenes, accurate tracking is often difficult. For example, the presence of distractors that are visually very similar to the target (i.e., semantically similar distractors) significantly impacts the tracking performance of appearance-model-based methods. Summary of the Invention
[0004] To address the issue that existing target tracking methods are susceptible to interference from similar semantic interference objects, this invention provides a target tracking method, apparatus, electronic device, and storage medium assisted by similar semantic interference objects, which can improve the tracking performance of target tracking methods in complex scenarios.
[0005] In a first aspect, the present invention provides a target tracking method based on similar semantic interference, comprising:
[0006] Obtain the target response map of the current frame; the target response map includes the response scores of each location within the tracking area;
[0007] Based on the target response map, a set of candidate targets is determined by extracting the peak response scores, and a set of candidate target coordinates is determined by combining the coordinates of the peak response scores.
[0008] Based on the set of candidate target coordinates, the relative positional relationship of each candidate target is determined; wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of a candidate target includes the relative position vector between the candidate target and all its similar semantic interferences;
[0009] The relative positional relationships of targets in the current frame are predicted using a cluster of Kalman motion models. The Kalman motion model cluster is constructed based on the relative positional relationships between the target and each candidate non-target in the initial frame and is updated using Kalman filtering. The similar semantic interferences of the target include each candidate non-target in the candidate target set. The relative positional relationships of the target include the relative position vectors between the target and all its similar semantic interferences.
[0010] Based on the predicted relative positional relationships of the current frame target and the relative positional relationships of each candidate target, the target to be tracked is determined through similarity matching.
[0011] The Kalman motion model cluster is updated based on the relative positional relationship of the targets in the current frame.
[0012] Optionally, determining the candidate target set based on the target response map by extracting the peak response score includes:
[0013] Obtain the maximum response score in the target response graph;
[0014] Based on the maximum response score in the target response graph and the preset minimum score threshold, the dynamic score threshold T = MAX(σ, μ·max(S)) is determined; where σ is the preset minimum score threshold, μ is the adjustment coefficient, S represents the target response graph, and max(S) represents the maximum response score in the target response graph.
[0015] Based on the target response map, the peak values of all response scores are obtained by extracting the maxima;
[0016] Based on the dynamic scoring threshold, all obtained response score peaks are filtered, and only response score peaks exceeding the dynamic scoring threshold are retained.
[0017] Optionally, based on extracting the maxima, all peak response scores are obtained, including:
[0018] Extract the maxima to obtain the peak values of all response scores;
[0019] The system determines whether there is peak overlap based on the position of each response score peak and the number of maximum values. If there is, the response score peak is increased; otherwise, execution continues.
[0020] Optionally, the step of determining whether there is peak overlap based on the peak position and the number of maxima of each response score includes:
[0021] If the number of maxima is less than the total number of candidate targets in the previous frame, then each high-scoring region is determined based on the target response map; the response score of the high-scoring region is not less than a preset response score threshold.
[0022] If the area of a region with a high response score is greater than a preset area threshold, then peak overlap is considered to exist.
[0023] Optionally, increasing the peak response score includes:
[0024] For high-scoring response regions whose area exceeds a preset area threshold, a new response score peak is added based on the score corresponding to the center position of the high-scoring response region.
[0025] Optionally, determining the target to be tracked through similarity matching includes:
[0026] The similarity score between the target and each candidate target is calculated using a distance function, expressed as:
[0027]
[0028]
[0029] Among them, M i V represents the relative positional relationship of the i-th candidate target. i The relative position of the predicted target in the current frame. Similarity between, v j Let v represent the relative position vector between the i-th candidate target and the j-th similar semantic distractor. k This represents the relative position vector between the predicted target in the current frame and the k-th similar semantic interference, where τ represents the preset similarity threshold.
[0030] Compare the similarity scores and determine the candidate target with the highest similarity score as the target to be tracked.
[0031] Optionally, updating the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame includes:
[0032] Traverse the relative positional relationships V of the targets in the current frame * For all relative position vectors in the array, perform the following operation:
[0033] For the relative positional relationship V * Given a relative position vector v*, if the predicted relative position relationship... In, there exists a relative position vector v ~ If the distance between the relative position vector v* and the relative position vector v* does not exceed a preset similarity threshold τ, then the relative position vector v* in the Kalman motion model cluster is updated using the relative position vector v*. ~ The corresponding Kalman filter;
[0034] Otherwise, a new Kalman filter is constructed using the relative position vector v*, and then added to the Kalman motion model cluster.
[0035] Secondly, the present invention also provides a target tracking device based on similar semantic interference, comprising:
[0036] The information acquisition module is used to acquire the target response map of the current frame; the target response map includes the response scores of each location within the tracking area;
[0037] The candidate target extraction module is used to determine the candidate target set by extracting the peak value of the response score based on the target response map, and to determine the candidate target coordinate set by combining the coordinates of the peak value of the response score;
[0038] The relative position extraction module is used to determine the relative positional relationship of each candidate target based on the candidate target coordinate set; wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of the candidate target includes the relative position vector between the candidate target and all its similar semantic interferences;
[0039] The relative position prediction module is used to predict the relative position relationship of the target in the current frame using a Kalman motion model cluster. The Kalman motion model cluster is constructed based on the relative position relationship between the target and each candidate non-target in the initial frame and is updated using Kalman filtering. The similar semantic interferences of the target include each candidate non-target in the candidate target set. The relative position relationship of the target includes the relative position vector between the target and all its similar semantic interferences.
[0040] The candidate target competition module is used to determine the tracked target based on the predicted relative positional relationship of the target in the current frame and the relative positional relationship of each candidate target through similarity matching.
[0041] The model update module is used to update the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame.
[0042] Thirdly, the present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, it implements the target tracking method described in any of the preceding claims.
[0043] Fourthly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the target tracking method described in any of the preceding claims.
[0044] This invention provides a target tracking method, device, electronic device, and storage medium based on similar semantic interference. This invention simultaneously detects the target and similar semantic interference, utilizes the relative spatial positional relationship between the target and similar semantic interference to construct a Kalman motion model cluster, and transmits this spatiotemporal sequence information in the image sequence. This can effectively suppress the influence of similar semantic interference and improve the tracking performance of the target tracking method in complex scenes. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0046] Figure 1 This is a flowchart illustrating the steps of a target tracking method based on similar semantic interference provided in an embodiment of the present invention;
[0047] Figure 2(a) shows a frame of the original image;
[0048] Figure 2(b) shows the target response diagram corresponding to Figure 2(a);
[0049] Figure 2(c) shows another frame of the original image;
[0050] Figure 2(d) shows the target response diagram corresponding to Figure 2(c);
[0051] Figure 3 This is a hardware architecture diagram of an electronic device provided in an embodiment of the present invention;
[0052] Figure 4 This is a structural diagram of a target tracking device based on similar semantic interference assisted by an embodiment of the present invention;
[0053] Figure 5 This is a flowchart of a target tracking method based on similar semantic interference provided by an embodiment of the present invention. Detailed Implementation
[0054] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0055] As mentioned earlier, most target tracking methods typically rely solely on the constructed appearance model to locate the target position in each frame. For some complex scenarios, this can be difficult to achieve accurate tracking. For instance, the presence of distractors that are visually very similar to the target (i.e., semantically similar distractors) can significantly impact the tracking performance of methods based solely on appearance models.
[0056] To handle complex scenarios such as complex backgrounds, target deformation, occlusion, and rapid movement, tracking methods generally need to have more complex feature representations and models. However, a large amount of computation and complex feature representations will inevitably consume a lot of time, making it impossible to guarantee the real-time performance of the method and causing many inconveniences in practical applications.
[0057] In practical applications, not only is accurate and robust tracking required, but also high real-time performance is essential. Quickly and accurately responding to complex challenges is crucial, a feat that most current methods struggle to achieve. Therefore, improving the tracking performance of target tracking methods in complex scenarios and achieving a balance between real-time performance and robustness remains a pressing issue that needs to be addressed.
[0058] In view of this, the present invention utilizes some similar semantic interference objects in the scene as anchor points for target localization to achieve target tracking, which can improve tracking performance and does not rely on more complex feature representations and models, thus achieving a balance between real-time performance and robustness.
[0059] The following describes the specific implementation of the above concept.
[0060] Please refer to Figure 1 This invention provides a target tracking method based on similar semantic interference, comprising:
[0061] Step 100: Obtain the target response map of the current frame; the target response map includes the response scores of each location within the tracking area;
[0062] The target response map is obtained from the original image and can be determined using existing target tracking techniques and based on the constructed appearance model, which will not be elaborated further here;
[0063] Step 102: Based on the target response map, determine the candidate target set by extracting the response score peak value, and determine the candidate target coordinate set by combining the response score peak value coordinates;
[0064] like Figures 2(a) to 2(d) As shown, when there are no similar semantic interference objects in the original image, the target response map presents a single-peak state; when there are similar semantic interference objects in the original image, the target response map presents a multi-peak state. This step 102 aims to locate all candidate targets in each frame. The candidate targets are located at the high-score peak positions of the target response map. By extracting the response score peaks, the candidate target set is determined. Then, by combining the response score peak coordinates, the position information of each candidate target can be determined, and the candidate target coordinate set is obtained. The candidate target coordinate set includes the position information of all candidate targets in the current frame.
[0065] Step 104: Determine the relative positional relationship of each candidate target based on the candidate target coordinate set;
[0066] Wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of the candidate targets includes the relative position vectors between the candidate target and all its similar semantic interferences;
[0067] For each candidate target, calculate its relative positional relationship, expressed as: V i ={c j -c i ||c j ∈C * and c j ≠c i}, C * c represents the set of candidate target coordinates. i Let c represent the coordinates of the i-th candidate target. j The coordinates of the j-th candidate target are represented by i and j, where i and j can take values no more than N, and N represents the total number of candidate targets in the current frame's candidate target set.
[0068] Step 106: Using a cluster of Kalman motion models, predict the relative positional relationships of the targets in the current frame;
[0069] The Kalman motion model cluster is constructed based on the relative positional relationship between the target and each candidate non-target in the initial frame and is updated using Kalman filtering; the similar semantic interferences of the target include each candidate non-target in the candidate target set; the relative positional relationship of the target includes the relative position vector between the target and all its similar semantic interferences.
[0070] If the current frame is the initial frame, a Kalman motion model cluster is constructed based on the relative positional relationships between the target and each candidate target (non-target) in the current frame, and then initialized.
[0071] The Kalman motion model cluster predicts the relative positional relationship between the target in the current frame and each similar semantic interference (i.e., each candidate target that is not the target), so as to suppress the interference of similar semantic interference on the tracked target.
[0072] Although the motion of all candidate targets between adjacent frames is small, the cumulative error cannot be ignored, especially in the case of fast motion. Therefore, it is necessary to establish a motion model to predict the positional relationships of propagating targets in the new frame. The Kalman motion model cluster is constructed as follows:
[0073]
[0074] Where V' represents the relative positional relationship of the targets in the previous frame, v' kThis indicates the relative positional relationship between the target in the previous frame and the k-th similar semantic interference object;
[0075] For the current frame, the relative positional relationships of the targets in the current frame are predicted using the constructed Kalman motion model family. k k .predect indicates that it is by The predicted relative position vector between the current frame target and the k-th similar semantic interference;
[0076] Step 108: Based on the predicted relative positional relationships of the current frame targets and the relative positional relationships of each candidate target, the tracked target is determined through similarity matching;
[0077] This step 108 aims to determine the target through similarity matching, utilizing the predicted relative positional relationship between the target and its similar semantic distractors, and the vector set {V} of the relative positional relationships between all candidate targets and their similar semantic distractors. i The candidate target corresponding to the relative position relationship with the highest matching degree is the target to be tracked.
[0078] Step 110: Update the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame.
[0079] This invention provides a target tracking method that effectively solves the problem of accurate tracking in complex scenes with similar background interference, rapid movement, and occlusion by fully utilizing spatial information in the scene. This invention simultaneously detects the target and similar semantic interference objects and constructs the relative spatial positional relationship between them, effectively suppressing the influence of interference objects. This invention utilizes the relative spatial positional relationship to construct a simple and efficient Kalman filter, namely a Kalman motion model cluster, and transmits this spatiotemporal sequence information in the image sequence. The Kalman motion model cluster contains rich relative motion states between candidate targets, which can effectively improve detection accuracy while maintaining real-time speed, achieving a good balance between real-time performance and robustness.
[0080] The following description Figure 1 The execution method of each step is shown.
[0081] Optionally, for step 102, based on the target response map, a candidate target set is determined by extracting the peak response score, including:
[0082] Step 102-0: Obtain the maximum response score in the target response graph;
[0083] Step 102-2: Based on the maximum response score in the target response graph and the preset minimum score threshold, determine the dynamic score threshold T = MAX(σ, μ·max(S)); where σ is the preset minimum score threshold, μ is the adjustment coefficient, S represents the target response graph, max(S) represents the maximum response score in the target response graph, and MAX() represents finding the maximum value.
[0084] Step 102-4: Based on the target response map, obtain the peak values of all response scores by extracting the maxima;
[0085] Step 102-6: Based on the dynamic scoring threshold, filter all obtained response score peaks and retain only the response score peaks that exceed the dynamic scoring threshold as candidate targets.
[0086] Candidate targets are located at the peak positions of response scores. Considering that peaks that are too small are less likely to be targets, selecting these peaks would only increase computational complexity. Therefore, this invention filters all response score peaks in the target response diagram S to reduce computational load and improve computational efficiency.
[0087] Furthermore, regarding steps 102-4, the following are included:
[0088] Extract the maxima to obtain the peak values of all response scores;
[0089] The system determines whether there is peak overlap based on the position of each response score peak and the number of maximum values. If there is, the response score peak is increased; otherwise, execution continues.
[0090] The determination of whether there is peak overlap based on the peak position and the number of maxima of each response score includes:
[0091] If the number of maxima is less than the total number of candidate targets in the previous frame, then each high-scoring region is determined based on the target response map; the response score of the high-scoring region is not less than a preset response score threshold.
[0092] If the area of a region with a high response score is greater than a preset area threshold, then peak overlap is considered to exist.
[0093] The increase in the peak response score includes:
[0094] For high-scoring response regions whose area exceeds a preset area threshold, a new response score peak is added based on the score corresponding to the center position of the high-scoring response region.
[0095] Considering that the target is in a moving state, there may be overlap between the target and similar semantic interference objects, meaning that the peak response score of a potential candidate target may be submerged. This invention determines whether there is a special case of peak overlap by detecting the difference between the potential candidate target (i.e., the maximum point) and the total number of candidate targets in the previous frame, combined with the area occupied by the potential candidate target (i.e., the area of the high-scoring region of the response score). If it exists, it is considered that the center position of the high-scoring region of the response score corresponds to the occluded potential candidate target, and the score corresponding to the center position is taken as the peak response score.
[0096] Optionally, for step 108, determining the target to be tracked through similarity matching includes:
[0097] Step 108-0: Using a distance function, calculate the similarity score of the relative positional relationship between the target and each candidate target. The expression is:
[0098]
[0099]
[0100] Among them, M i V represents the relative positional relationship of the i-th candidate target. i The relative position of the predicted target in the current frame. Similarity between, v j Let v represent the relative position vector between the i-th candidate target and the j-th similar semantic distractor. k This represents the relative position vector between the predicted target in the current frame and the k-th similar semantic interference, where τ represents the preset similarity threshold.
[0101] Step 108-2: Compare the similarity scores and determine the candidate target with the highest similarity score as the target to be tracked.
[0102] Using the above embodiments, the similarity of the relative positional relationship between a target and each candidate target can be quickly determined based on distance relationships, with high reliability and ease of calculation. In other embodiments, other distance relationships can also be used for similarity matching.
[0103] Optionally, for step 110, the following are included:
[0104] Traverse the relative positions of targets in the current frame V* For all relative position vectors in the array, perform the following operation:
[0105] For the relative positional relationship V * Given a relative position vector v*, if the predicted relative position relationship... In, there exists a relative position vector v ~If the distance between the relative position vector v* and the relative position vector v* does not exceed a preset similarity threshold τ, then the relative position vector v* in the Kalman motion model cluster is updated using the relative position vector v*. ~ The corresponding Kalman filter;
[0106] Otherwise, a new Kalman filter is constructed using the relative position vector v*, and then added to the Kalman motion model cluster.
[0107] Considering the similar semantic interference detected in the previous frame, there may be situations where it is occluded, outside the tracking range, or not detected in the current frame, which will affect the relative positional relationship. and relative positional relationship V * The relative position vectors in the Kalman filter are not in a one-to-one correspondence; the number of relative position vectors may vary. Therefore, the conventional method cannot be used to update the Kalman filter. The method used in this invention is more flexible. This invention traverses the relative position relationships V... * The relative position vector v in k If the relative positional relationship There is a relative position vector v ~ If the distance to the relative position vector v* is less than the threshold τ, meaning a match has been found, then the relative position vector v* is used to update the relative position vector v. ~ The corresponding Kalman filter is used; otherwise, a new Kalman filter is reconstructed using the relative position vector v* and added to the Kalman motion model cluster. This completes the state update of the Kalman motion model cluster. Using the above embodiment, the Kalman motion model cluster update for flexible target tracking can be completed, avoiding the problem of mismatched relative positions and improving the reliability of similarity matching.
[0108] like Figure 3 , Figure 4 As shown, this embodiment of the invention provides a target tracking device based on similar semantic interference. The device embodiment can be implemented in software, hardware, or a combination of both. From a hardware perspective, such as... Figure 3 The diagram shown is a hardware architecture diagram of an electronic device for a target tracking device assisted by similar semantic interference, provided by an embodiment of the present invention. (Except for...) Figure 3 In addition to the processor, memory, network interface, and non-volatile memory shown, the electronic device in the embodiment may also include other hardware, such as a forwarding chip responsible for processing packets. Taking software implementation as an example, such as... Figure 4 As shown, a device in a logical sense is formed by the CPU of its electronic device reading the corresponding computer program from non-volatile memory into memory and running it. This embodiment provides a target tracking device based on similar semantic interference assistance, comprising:
[0109] The information acquisition module 401 is used to acquire the target response map of the current frame; the target response map includes the response scores of each position within the tracking area;
[0110] The candidate target extraction module 402 is used to determine a set of candidate targets based on the target response map by extracting the peak values of the response scores, and to determine a set of candidate target coordinates by combining the coordinates of the peak values of the response scores;
[0111] The relative position extraction module 403 is used to determine the relative positional relationship of each candidate target based on the candidate target coordinate set; wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of the candidate target includes the relative position vector between the candidate target and all its similar semantic interferences;
[0112] The relative position prediction module 404 is used to predict the relative position relationship of the target in the current frame using a Kalman motion model cluster; wherein, the Kalman motion model cluster is constructed based on the relative position relationship between the target and each candidate non-target in the initial frame, and is updated using Kalman filtering; the similar semantic interferences of the target include each candidate non-target in the candidate target set; the relative position relationship of the target includes the relative position vector between the target and all its similar semantic interferences;
[0113] The candidate target competition module 405 is used to determine the tracked target based on the predicted relative positional relationship of the target in the current frame and the relative positional relationship of each candidate target through similarity matching.
[0114] The model update module 406 is used to update the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame.
[0115] In this embodiment of the invention, the information acquisition module 401 can be used to execute step 100 in the above method embodiment, the candidate target extraction module 402 can be used to execute step 102 in the above method embodiment, the relative position extraction module 403 can be used to execute step 104 in the above method embodiment, the relative position prediction module 404 can be used to execute step 106 in the above method embodiment, the candidate target competition module 405 can be used to execute step 108 in the above method embodiment, and the model update module 406 can be used to execute step 110 in the above method embodiment. The information acquisition module 401 can obtain the target response map using a basic tracker. Here, the basic tracker can be any target tracking technology, which has high applicability and can be plugged and played.
[0116] Optionally, the candidate target extraction module 402 determines the candidate target set based on the target response map by extracting the peak response score, including performing the following operations:
[0117] Obtain the maximum response score in the target response graph;
[0118] Based on the maximum response score in the target response graph and the preset minimum score threshold, the dynamic score threshold T = MAX(σ, μ·max(S)) is determined; where σ is the preset minimum score threshold, μ is the adjustment coefficient, S represents the target response graph, and max(S) represents the maximum response score in the target response graph.
[0119] Based on the target response map, the peak values of all response scores are obtained by extracting the maxima;
[0120] Based on the dynamic scoring threshold, all obtained response score peaks are filtered, and only response score peaks exceeding the dynamic scoring threshold are retained.
[0121] Optionally, the candidate target competition module 405 determines the tracked target through similarity matching, including performing the following operations:
[0122] The similarity score between the target and each candidate target is calculated using a distance function, expressed as:
[0123]
[0124]
[0125] Among them, M i V represents the relative positional relationship of the i-th candidate target. i The relative position of the predicted target in the current frame. Similarity between, v j Let v represent the relative position vector between the i-th candidate target and the j-th similar semantic distractor. k This represents the relative position vector between the predicted target in the current frame and the k-th similar semantic interference, where τ represents the preset similarity threshold.
[0126] Compare the similarity scores and determine the candidate target with the highest similarity score as the target to be tracked.
[0127] Optionally, the model update module 406 updates the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame, including:
[0128] Traverse the relative positional relationships V of the targets in the current frame * For all relative position vectors in the array, perform the following operation:
[0129] For the relative positional relationship V * Given a relative position vector v*, if the predicted relative position relationship... In, there exists a relative position vector v ~ If the distance between the relative position vector v* and the relative position vector v* does not exceed a preset similarity threshold τ, then the relative position vector v* in the Kalman motion model cluster is updated using the relative position vector v*. ~ The corresponding Kalman filter;
[0130] Otherwise, a new Kalman filter is constructed using the relative position vector v*, and then added to the Kalman motion model cluster.
[0131] It is understood that the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on a target tracking device assisted by similar semantic interference. In other embodiments of the present invention, a target tracking device assisted by similar semantic interference may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0132] The information interaction and execution process between the modules in the above-mentioned device are based on the same concept as the method embodiment of the present invention, and the specific details can be found in the description of the method embodiment of the present invention, and will not be repeated here.
[0133] This invention also provides an electronic device, including a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a target tracking method based on similar semantic interference assisted by any embodiment of this invention.
[0134] This invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform a target tracking method based on similar semantic interference assisted in any embodiment of this invention.
[0135] Specifically, a system or apparatus equipped with a storage medium may be provided, on which software program code implementing the functions of any of the embodiments described above is stored, and the computer (or CPU or MPU) of the system or apparatus may read and execute the program code stored in the storage medium.
[0136] In this case, the program code read from the storage medium can itself implement the function of any of the above embodiments, and therefore the program code and the storage medium storing the program code constitute part of the present invention.
[0137] Examples of storage media used to provide program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, program code can be downloaded from a server computer via a communication network.
[0138] Furthermore, it should be clear that not only can the program code read by the computer be executed, but also the operating system or other components operating on the computer can be instructed based on the program code to perform some or all of the actual operations, thereby realizing the function of any of the embodiments described above.
[0139] Furthermore, it is understood that the program code read from the storage medium is written to the memory set in the expansion board inserted into the computer or to the memory set in the expansion module connected to the computer. Then, based on the instructions of the program code, the CPU or other components installed on the expansion board or expansion module execute some and all of the actual operations, thereby realizing the function of any of the above embodiments.
[0140] The embodiments of the present invention have at least the following beneficial effects:
[0141] 1. In one embodiment of the present invention, a target tracking method based on similar semantic interference is provided. The present invention considers that temporal information contains a large amount of temporal trajectory information, which is complementary to single-frame image information. This information can effectively help video target tracking methods handle difficult scenarios such as target occlusion, rapid deformation, complex backgrounds, and lighting changes. However, camera shake can severely affect this information, leading to tracking drift. Constructing a motion model based on relative positional relationships can effectively reduce the interference of camera motion. Therefore, for the sake of method efficiency, the present invention uses a simple and effective Kalman filter to construct the motion model, such as... Figure 5 As shown, this invention utilizes the image at time t-1 (i.e., the previous frame) to obtain the target response map S', extract candidate targets to obtain the candidate target set O', and determine the relative positional relationship V' of the targets at time t-1. After processing the image at time t by obtaining the target response map S, extracting candidate targets to obtain the candidate target set O, and then determining the relative positional relationship of the targets, a Kalman filter is used for prediction. Based on the matching results, the targets are determined, which can effectively transmit temporal information. Extensive experiments were conducted on several benchmark datasets to verify the effectiveness of the method proposed in this invention.
[0142] 2. In one embodiment of the present invention, a target tracking device based on similar semantic interference is provided. The model designed in this invention is lightweight, efficient, requires no training, and has very high applicability.
[0143] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0144] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media that can store program code, such as ROM, RAM, magnetic disk, or optical disk.
[0145] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A target tracking method based on similar semantic distractor assistance, characterized in that, Includes the following steps: Obtain the target response map of the current frame; the target response map includes the response scores of each location within the tracking area; Based on the target response map, a candidate target set is determined by extracting the response score peaks, and a candidate target coordinate set is determined by combining the response score peak coordinates, including: Obtain the maximum response score in the target response graph; Based on the maximum response score in the target response graph and the preset minimum score threshold, a dynamic score threshold is determined. ;in The preset minimum score threshold, To adjust the coefficient, S Represents the target response map, max( S () represents the maximum response score in the target response graph; Based on the target response map, the peak values of all response scores are obtained by extracting the maxima; Based on the dynamic scoring threshold, all obtained response score peaks are filtered, and only response score peaks exceeding the dynamic scoring threshold are retained. Based on the set of candidate target coordinates, the relative positional relationship of each candidate target is determined; wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of a candidate target includes the relative position vector between the candidate target and all its similar semantic interferences; The relative positional relationships of targets in the current frame are predicted using a cluster of Kalman motion models. The Kalman motion model cluster is constructed based on the relative positional relationships between the target and each candidate non-target in the initial frame and is updated using Kalman filtering. The similar semantic interferences of the target include each candidate non-target in the candidate target set. The relative positional relationships of the target include the relative position vectors between the target and all its similar semantic interferences. Based on the predicted relative positional relationships of the current frame target and the relative positional relationships of each candidate target, the target to be tracked is determined through similarity matching. The Kalman motion model cluster is updated based on the relative positional relationship of the targets in the current frame.
2. The target tracking method according to claim 1, characterized in that, Based on extracting the maxima, all response score peaks are obtained, including: Extract the maxima to obtain the peak values of all response scores; The system determines whether there is peak overlap based on the position of each response score peak and the number of maximum values. If there is, the response score peak is increased; otherwise, execution continues.
3. The target tracking method according to claim 2, characterized in that, The determination of whether there is peak overlap based on the peak position and the number of maxima of each response score includes: If the number of maxima is less than the total number of candidate targets in the previous frame, then each high-scoring region is determined based on the target response map; the response score of the high-scoring region is not less than a preset response score threshold. If the area of a region with a high response score is greater than a preset area threshold, then peak overlap is considered to exist.
4. The target tracking method according to claim 3, characterized in that, The increase in the peak response score includes: For high-scoring response regions whose area exceeds a preset area threshold, a new response score peak is added based on the score corresponding to the center position of the high-scoring response region.
5. The target tracking method according to claim 1, characterized in that, The process of determining the target to be tracked through similarity matching includes: The similarity score between the target and each candidate target is calculated using a distance function, expressed as: in, Indicates the first i Relative positional relationship of candidate targets The relative position of the predicted target in the current frame. Similarity between v j Indicates the first i The candidate target and the first j The relative position vectors between similar semantic distractors v k This indicates the predicted target in the current frame and the target in the first frame. k The relative position vectors between similar semantic distractors This indicates the preset similarity threshold; Compare the similarity scores and determine the candidate target with the highest similarity score as the target to be tracked.
6. The target tracking method according to claim 1, characterized in that, The process of updating the state of the Kalman motion model cluster based on the relative positional relationships of the targets in the current frame includes: Traversing relative position relationships of a current frame target all relative position vectors, the following operations are performed: relative positional relationship One relative position vector v* If the predicted relative positional relationship In, there exist relative position vectors. v ~ With this relative position vector v* The distance between them does not exceed the preset similarity threshold. Then, using this relative position vector v* Update the relative position vectors in the Kalman motion model cluster. v ~ The corresponding Kalman filter; Otherwise, using the relative position vector v* A new Kalman filter is constructed and the Kalman motion model cluster is added.
7. A target tracking apparatus based on similar semantic distractor assistance, characterized by, The method applied to any one of claims 1-6 includes: The information acquisition module is used to acquire the target response map of the current frame; the target response map includes the response scores of each location within the tracking area; The candidate target extraction module is used to determine the candidate target set by extracting the peak value of the response score based on the target response map, and to determine the candidate target coordinate set by combining the coordinates of the peak value of the response score; The relative position extraction module is used to determine the relative positional relationship of each candidate target based on the candidate target coordinate set; wherein, the similar semantic interferences of each candidate target include other candidate targets in the candidate target set besides the candidate target itself; the relative positional relationship of the candidate target includes the relative position vector between the candidate target and all its similar semantic interferences; The relative position prediction module is used to predict the relative position relationship of the target in the current frame using a Kalman motion model cluster. The Kalman motion model cluster is constructed based on the relative position relationship between the target and each candidate non-target in the initial frame and is updated using Kalman filtering. The similar semantic interferences of the target include each candidate non-target in the candidate target set. The relative position relationship of the target includes the relative position vector between the target and all its similar semantic interferences. The candidate target competition module is used to determine the tracked target based on the predicted relative positional relationship of the target in the current frame and the relative positional relationship of each candidate target through similarity matching. The model update module is used to update the state of the Kalman motion model cluster based on the relative positional relationship of the targets in the current frame. 8.An electronic device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that, When the processor executes the computer program, it implements the target tracking method as described in any one of claims 1-6.
9. A storage medium having stored thereon a computer program, characterized in that When the computer program is executed in the computer, it causes the computer to perform the target tracking method according to any one of claims 1-6.