Target tracking method based on task distinguishing detection and re-identification joint network

A re-identification and task technology, applied in the field of computer vision, can solve problems such as not being able to adapt to different tasks, and achieve the effect of improving accuracy

Pending Publication Date: 2022-04-12
BEIHANG UNIV
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

Therefore, it is not enough to adopt the same network structure for these two tasks, and only use the constraints of different loss fun...
View more

Method used

(3.1) feature fusion structure of multi-task layering: to the feature map of the stage 1 of DLA output to stage N these N stages different scales, select the feature map of stage 1 to stage M as the multiple of follow-up target detection tasks The input of the feature fusion network, the feature map from stage 1 to stage N is selected as the input of the multi-feature fusion network for the subsequent target re-identification feature extraction task. The target detection task uses low-level features with rich spatial information for fusion, and the target re-identification feature extraction task further fuses high-level features with more prominent semantic information, and improves the identification ability of target identity through the fusion of high-level and low-level features.
The present invention uses deep layer aggregation network (Deep Layer Aggregation, DLA) as backbone network, and its complete network structure diagram is as shown in Figure 2, and core module is the hierarchical depth aggregation network (HierarchicalDeepAggregation, HDA) represented...
View more

Abstract

The invention provides a target tracking method based on a task distinguishing detection and re-identification combined network. According to the method, a task-differentiated multi-feature fusion target detection and re-identification combined network is constructed based on FairMOT, a target detection task and a target re-identification feature extraction task are integrated in the same combined network, and shared features are extracted by using a backbone network, so that differentiated multi-feature fusion is performed according to task features; according to the method, the emphasis of two tasks on feature requirements is fully considered while target detection and target re-identification feature extraction tasks are balanced, the accuracy of target detection and re-identification feature extraction is improved, and accurate multi-target tracking is further realized. Wherein a multi-task layered feature fusion structure or a multi-task independent feature fusion structure is adopted in the multi-feature fusion network, so that two different tasks can fuse information of different scales, task-oriented feature separation is realized earlier, and fusion features which are more beneficial to different sub-task branches are obtained.

Application Domain

Character and pattern recognitionNeural architectures +1

Technology Topic

Machine learningFeature fusion +4

Image

  • Target tracking method based on task distinguishing detection and re-identification joint network
  • Target tracking method based on task distinguishing detection and re-identification joint network
  • Target tracking method based on task distinguishing detection and re-identification joint network

Examples

  • Experimental program(1)

Example Embodiment

[0028] As previously described, the present invention proposes a target tracking method based on the task distinguishing and retrofitting a combination network, and the specific embodiment of the present invention will be described below with reference to the accompanying drawings.
[0029] (1) Overall process
[0030] The present invention proposes a task to distinguish the detection of the joint network to realize video multi-objective tracking. The task distinguishes the overall frame map of the joint network figure 1 As shown, three parts: (1) backbone network; (2) multi-feature fusion network; (3) multi-task branch. These three parts are also three steps of the present invention proposed.
[0031] For the current frame of the input, first utilize the DLA backbone network (such as attached figure 2 As shown) Extract sharing features for target detection tasks and target retrieve feature extraction tasks in images, the DLA can output a feature map of 1 to N. The lower the characterization of the lower layer, can better retain the low-level information in the original map, such as edge, texture, spatial distribution, etc., which is more advantageous for the target detection task that needs to be judged; the feature map space information of the higher layer is gradually lost. The high-level semantic information related to the re-identification task is gradually prominent, and it is more suitable for the confirmation of the target identity. Therefore, the present invention proposes to selectively learn the appropriate features based on the task characteristics, and use it as an input to subsequent multi-character fusion networks.
[0032] In a multi-feature fusion network, a targeted fusion is performed according to the different tasks. Specifically, for the target detection task, it is considered that it is more concerned with the target position, the accuracy of the positioning requires more low-level features, so the fusion stage 1 to stage M (M image 3 As shown, when using the IDA module to make a multi-scale feature chart link, it is necessary to perform the upper and lower resolution characteristics, and the interpolation and polymerization of the feature is iteratively, and the characteristics of the plurality of stages are integrated from shallow to deeply. The high-resolution feature of the deeper decoder, the final output depth fusion, the input of the task branch as a subsequent target detection and retrieve feature extraction task branch. DLA outputs 1 to N different stages, n is preferably 4. In a multi-feature fusion, the target detection task fusion phase 1 to the feature map of phase M, M is preferably 3; the target re-identification feature extraction task fusion stage 1 to stage N is characterized, N is preferably 4.
[0033] Finally, the fused feature input target detection task branch and the retrieve feature extraction task branch, each task branch uses different loss function constraints to train to complete the target detection task and target re-identification feature extraction task. In this way, while equilibrium different tasks, it is also considered that different tasks have a focus difference between the target features, and the characteristics of the two tasks are distinguished, and the accuracy of the target detection and the target re-identification feature is improved.
[0034] Among them, the target detection task is composed of thermogram branches, size branches, and offset branches, positioning the target of the current frame; the target re-identification feature extraction branch is based on the target center point position obtained by the target detection task, from the full map of embedded characterization The embedded representation of the target in the vector cube is extracted in the position of the target, which is used for the calculation of the target interpretation, thereby determining the target identity ID, achieving multi-target tracking.
[0035] (2) Backbone network
[0036] A backbone network DLA extracts the sharing features required for the target detection task and the target re-identification feature extraction task.
[0037] The present invention uses a deep-polymerization network (DLA) as a backbone network, and its complete network structure figure 2 As shown, the core module is a hierarchicaldeepaggregation, HDA, indicated by the point line frame, and an iterative depth aggregation, IDA, represented by the point line arrow. The dashed frame in the figure indicates the polymer node, and the dotted arrow indicates the twice the sample process. The HDA module is a hierarchical structure of a tree link, which is capable of better propagating features and gradients, and the IDA module is responsible for linking different phases (STAGE) features. The stage is for each HDA module.
[0038] In the backbone network DLA, each HDA module outputs a corresponding resolution aggregation result, ie figure 2 The polymeric node in the uppermost corner of the midpoint line frame, and the IDA module is linked to these aggregate nodes. On the one hand, the HDA module is fused to semantic information by aggregation in the channel direction, and on the other hand, the IDA module achieves the fusion of spatial information by aggregation in the resolution and scale direction.
[0039] Finally, the DLA output phase 1 is a feature map of different scales of the N stage, and the size is the same size C × H × W, as an input to the subsequent multi-character fusion network. H × W is the resolution of the input image, C is the number of channels.
[0040] (3) Multi-feature fusion network
[0041] For the stage 1 to stage n of the DLA output, the characteristics of different scales of the N stages, using multi-task hierarchical multi-character fusion structures, enabling the target detection task and target re-identification feature extraction task to pass through a multi-feature fusion network Sharing parameters, combining a more advantageous feature of their respective tasks; using multi-task independent multi-character fusion structures, constructing two characteristics of the characteristic fusion network fusion of each other independent, non-shared parameters. The dimension of the obtained fusion characteristics is H / 4 × W / 4 × 64, and H × W is the resolution of the input image of the model.
[0042] (3.1) Multitasking Characteristic Fusion Structure: Section 1 to Stage N of DLA Output Different Scale Features, Select Phase 1 to Phase M As a multi-character fusion network for subsequent target detection tasks Input, selecting a feature map of the phase 1 to stage N as the input of the multi-character fusion network of the subsequent target reconstruction feature extraction task. The target detection task uses a relatively rich low-level characteristics of space information, and the target re-identification feature extraction task further fuses more prominent high-order high-raced features that fuse the identity of the target identity through high and low-level features.
[0043] The multi-task-layered multi-character fusion structure allows the target detection task and the target re-identification feature extraction task to fuse the characteristics of each task more advantageous in a multi-feature fusion network. The multi-feature fusion network uses the IDA feature fusion network, that is, the IDA module is characterized by the ida module. The IDA module has a characteristic map of different stages of the DLA output. When using the IDA module to make a multi-scale feature chart link, the low resolution is required. The rate feature is sampled, and the interpolation and polymerization of the feature is used iteratively, the high-resolution characteristics of a growing decoder and final output depth fusion are formed from shallow to deeply fusion.
[0044] (3.2) Multi-task independent feature fusion structure: a feature map of 1 to phase of DLA output 1 to phase N this N-stage different scales, selecting a feature map of the selection phase 1 to stage M as a multi-character fusion network for subsequent target detection tasks Input, the feature map of the selection phase 1 to stage N is the input of the multi-character fusion network of the subsequent target refer to the feature extraction task. Two independent multi-character fusion networks are respectively constructed for target detection tasks and target re-identification feature extraction tasks, which are independent of each other, non-shared parameters, fusion multi-stage features, respectively, respectively, respectively, for subsequent target detection, respectively. And reconfirmation feature extraction. The multi-feature fusion network uses an IDA feature fusion network.
[0045] (4) Multi-task branch
[0046] After obtaining the fusion feature of distinguishing the target detection task and the reconstruction feature extraction task, these fusion features are input to the target detection task branch, the reconstruction feature extraction task branch, and training through the network branch of the same structure, and the two network branches use different The loss function is constrained, and the dimension of each branch prediction is H / 4 × W / 4 × S, where H × W is the resolution of the input image of the model, and S represents the number of channels corresponding to each branch. Each branch is input as an input, first through a convolution layer, and then activated with a RELU layer, and finally the predicted result is output through a convolution layer.
[0047]In the target detection task branch, the target detection feature of the multi-feature fusion network output is input to the thermogram branch, the size branch and the offset branch, and the thermal map branch of the loss function constraint uses a size adaptive pixel level logic regression loss function, size. Branches and offset branches are trained with L1 loss, and the thermal map branch determines the target center point position, the size branch determines the target length, wide, and the offset branch is accurately positioned the target center point position offset, thereby positioning the current frame The location of the target; the target re-identification feature extraction task branch, the re-identification feature input of the multi-feature fusion network output is embedded in the table symbol, and each target is used as a class, through the convolution layer -relu activation layer - convolution layer, The loss function of the classification task is trained, and the extracted feature is expressed as embedded in the syndrome, and the center point position of the target is obtained according to the target detection task, and the target is extracted from the full map. Extract the target, for the target The apparent similarity is calculated, and the target ID is determined by the similarity calculation result. Finally, the target detection branch positioning target position, the target feature extraction branch matches the extracted characterization vector computing the similarity to the target, and ultimately realizes multi-target tracking.
[0048] The above disclosure is only the specific examples of the present invention, and the thoughts provided in the present invention, those skilled in the art can fall within the scope of the invention.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Network for formance measuring method

InactiveCN101026504Aimprove accuracy
Owner:HUAWEI TECH CO LTD

Bayonet vehicle image identification method based on image features

InactiveCN103150904Aimprove accuracyfine classification
Owner:SUN YAT SEN UNIV +1

TR309 - portable otoscope video viewer

InactiveUS20050171399A1easily attainableimprove accuracy
Owner:RICH TONY C +1

Classification and recommendation of technical efficacy words

  • improve accuracy

Golf club head with adjustable vibration-absorbing capacity

InactiveUS20050277485A1improve grip comfortimprove accuracy
Owner:FUSHENG IND CO LTD

Direct fabrication of aligners for arch expansion

ActiveUS20170007366A1improve accuracyimproved strength , accuracy
Owner:ALIGN TECH

Stent delivery system with securement and deployment accuracy

ActiveUS7473271B2improve accuracyreduces occurrence and/or severity
Owner:BOSTON SCI SCIMED INC

Method and apparatus for image-based eye tracking for retinal diagnostic or surgery device

Owner:SENSOMOTORIC INSTR FUR INNOVATIVE SENSORIK MBH D B A SENSOMOTORIC INSTR +1

Method for improving an HS-DSCH transport format allocation

InactiveUS20060089104A1improve accuracyincrease benefit
Owner:NOKIA SOLUTIONS & NETWORKS OY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products