A pedestrian re-identification method in a small sample environment
By enhancing pedestrian features through multi-head self-attention and spatial attention modules, and combining dual metrics and a meta-learning framework, the problem of insufficient data for pedestrian re-identification in small sample environments is solved, achieving efficient pedestrian re-identification results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG UNIV OF TECH
- Filing Date
- 2023-03-24
- Publication Date
- 2026-06-30
AI Technical Summary
Existing deep learning-based pedestrian re-identification algorithms heavily rely on massive amounts of high-quality labeled pedestrian images, which cannot effectively solve the problem of pedestrian re-identification in small sample environments, especially when the amount of data is insufficient or the category labels are missing, resulting in poor recognition performance.
A multi-head self-attention module and a spatial attention module are used to enhance pedestrian features. Combined with a dual-metric module and a meta-learning framework, data augmentation and transfer learning methods are used to improve the diversity and discriminativeness of features. The similarity bias is reduced by using a dual similarity metric method. Finally, a small-sample pedestrian re-identification network is constructed.
It improves the accuracy of pedestrian re-identification in small sample environments, overcomes the dependence and complexity problems of traditional methods, and achieves efficient pedestrian re-identification.
Smart Images

Figure CN116503897B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of pattern recognition and machine learning technology, and in particular to a method for pedestrian re-identification in a small sample environment. Background Technology
[0002] In recent years, deep learning-based person re-identification algorithms have achieved many significant breakthroughs. However, these algorithms heavily rely on massive amounts of high-quality labeled pedestrian images. In many practical applications, only a small number of usable pedestrian images and pedestrian data with missing labels can be collected. Therefore, the small sample problem caused by insufficient sample size or category labels has become a major challenge in the field of person re-identification.
[0003] Inspired by the ability of humans to develop knowledge of new things through only one or a few examples, the concept of few-shot learning was proposed. Few-shot learning primarily addresses two types of problems: 1) learning and recognition problems with limited datasets and a limited number of samples of each category; 2) learning and recognition problems with large datasets, but with missing or incorrectly labeled sample categories. Pedestrian re-identification in a few-shot environment also faces these two challenges.
[0004] Recent research has mainly focused on small-sample pedestrian re-identification techniques with insufficient category labeling, but there is a lack of research on small-sample pedestrian re-identification techniques that address the problem of insufficient data.
[0005] Currently, few-shot learning algorithms that address the problem of insufficient data can be broadly divided into two branches: 1) data augmentation-based methods; and 2) transfer learning-based methods.
[0006] Data augmentation-based few-shot learning methods aim to augment or enhance the features of small-sample datasets using auxiliary data or information. This approach can effectively improve sample diversity and alleviate the problem of insufficient data in small-sample environments. However, it does not perform further processing on the augmented data or features, easily introducing noisy data or features, and thus failing to significantly improve the classification boundary.
[0007] Few-shot learning methods based on transfer learning: These methods aim to quickly transfer learned knowledge to a new domain and can be broadly divided into metric-based learning and meta-learning methods. These methods endow the model with self-learning capabilities, enabling the network to learn more discriminative sample features and improve recognition accuracy in small-sample environments. However, when the number of samples is too small, the model's learning ability is insufficient, ultimately leading to poor recognition performance.
[0008] Although there has been considerable research on the problem of few-shot learning with insufficient data, the above research has not been applied in the field of person re-identification due to the numerous interfering factors in person re-identification. Summary of the Invention
[0009] To address the aforementioned problems, this invention provides a pedestrian re-identification method in a small sample environment.
[0010] To achieve the above objectives, the technical solution of the present invention is as follows:
[0011] A pedestrian re-identification method in a small-sample environment includes the following steps:
[0012] Step S1 enhances the processing of pedestrian features. First, the multi-head self-attention module (MSM) obtains more sufficient feature information from samples at different scales. Then, the second feature set obtained by the multi-head self-attention module is introduced into the spatial attention module (SAM) to recalibrate it in the spatial dimension to obtain the third feature set.
[0013] Step S2 measures pedestrian features. The third feature set is processed by the dual measurement module and the relation module to obtain the first measurement score and the second measurement score, and then weighted and fused to obtain the joint measurement score.
[0014] The above-mentioned multi-head self-attention module (MSM) is defined as follows:
[0015] (1)
[0016] Feature maps extracted by feature extraction networks Simultaneously through a 3×3 convolutional layer , layer and 1×1 convolutional layers Later obtained , It is a regulating factor, then processed by tensor block function. Later obtained and , To construct the matrix and To construct the value matrix, With transposed After multiplication function ,in q n To query the matrix, then... Multiplication yields a single self-attention feature map ,in Obtained through multiple identical operations. Then use cat to... splicing, through layer and max pooling layer Later obtained That is, the second feature set.
[0017] The above spatial attention module (SAM) is defined as follows:
[0018] (2)
[0019] Feature set obtained from the second feature set Simultaneously, after passing through the maximum pooling layer and average pooling layer Afterwards, The layers are concatenated along the channel dimension and then passed through a 7×7 convolutional layer. and function Generate the final spatial attention diagram ,at last, We obtain the spatially weighted third feature set.
[0020] The second metric score of the above relational metrics is defined as follows:
[0021] (4)
[0022] Pedestrian feature set obtained from the third feature set First, it undergoes two convolutions. and max pooling layer get , Then through two fully connected layers , Activation function as well as function The nonlinear classifier is used to obtain relation metric scores. .
[0023] The first metric score of the above dual metric module The formula is:
[0024]
[0025]
[0026] (5)
[0027] in, This indicates that the feature set is obtained through the third feature set. This represents the similarity score of the cosine module. Indicates cosine similarity layers. Indicates a convolutional layer. Indicates the max pooling layer. Indicates the average pooling layer. This represents the Euclidean distance score. This indicates the Euclidean distance metric layer. This represents the revised second metric score, where the Euclidean distance between the two is calculated since the support set samples do not contain images from the query set. Since it is not 0, the fraction in formula (5) holds true.
[0028] The aforementioned pedestrian re-identification method in a small sample environment also includes step S3 meta-learning.
[0029] The pedestrian feature enhancement process includes: introducing a feature set enhancement module into the network feature embedding module. This module first uses a multi-head self-attention module to obtain feature sets containing diverse pedestrian features from different feature layers, and then uses a spatial attention module to recalibrate the feature sets obtained by the multi-head self-attention module in the spatial dimension, making the extracted pedestrian features more diverse and discriminative, thereby making up for the problem of insufficient pedestrian data.
[0030] The feature computation based on dual similarity metrics involves enabling the network model to simultaneously learn two different similarity metrics, and finally calculating the joint loss of the network based on the scores of the two metrics to adjust the network parameters in reverse. This method can effectively reduce the similarity bias of pedestrian features and improve the pedestrian re-identification performance of the model in small sample environments.
[0031] Among them, the dual similarity measurement method includes two methods: relation measurement and dual measurement.
[0032] Among them, the relation metric is a method that uses a non-linear classifier built with convolutional layers and a sigmoid function to effectively learn the relationship between samples and determine the classification result.
[0033] Among them, the dual metric is a metric fusion method that uses Euclidean distance as the cosine metric weight, and comprehensively considers the directional differences and absolute distances of sample features to obtain a more reliable metric score.
[0034] The meta-task construction based on the meta-learning framework uses a metric learning network, composed of a feature extraction layer based on feature enhancement and a metric learning layer based on bisimilarity, as its backbone. Multiple meta-tasks are generated iteratively, and these meta-tasks are used to complete the network's training, validation, and testing. The goal is to train an improved neural network on the meta-tasks in the training set, learning transferable pedestrian re-identification knowledge. Secondly, the learned knowledge is used to tune the hyperparameters of the meta-tasks in the validation set. Finally, the generalization accuracy is reported based on the average model accuracy in the meta-tasks of the test set.
[0035] The beneficial effects of this invention are as follows:
[0036] This invention provides a pedestrian re-identification method in a small sample environment. This re-identification idea integrates two types of methods: data augmentation and transfer learning. It can overcome the shortcomings of traditional deep learning-based pedestrian re-identification methods, such as over-reliance on massive amounts of high-quality labeled pedestrian images, high network complexity, high training difficulty, and inability to cope with insufficient pedestrian sample size in real-world environments. It can achieve efficient re-identification even when pedestrian data is insufficient.
[0037] To make the above-mentioned features and advantages of the invention more apparent and understandable, specific embodiments are described below, and detailed descriptions are provided in conjunction with the accompanying drawings. Attached Figure Description
[0038] Figure 1 This is a diagram of a few-sample pedestrian re-identification network structure based on feature set enhancement and metric fusion.
[0039] Figure 2 This is a structural diagram of the feature set enhancement module.
[0040] Figure 3 This is a diagram of a multi-head self-attention structure.
[0041] Figure 4 This is a diagram of spatial attention structure.
[0042] Figure 5 This is a diagram of the relational measurement structure.
[0043] Figure 6 This is a diagram of a dual-metric structure.
[0044] Figure 7 This is a diagram of the meta-task framework. Detailed Implementation
[0045] To make the objectives and technical solutions of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the described embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0046] The present invention will now be further described with reference to the accompanying drawings:
[0047] Figure 1 This is a diagram of a few-sample pedestrian re-identification network structure based on feature set enhancement and metric fusion.
[0048] The pedestrian re-identification method provided by this invention for small sample environments firstly involves enhancing pedestrian features P1 in step S1. The feature extraction network Block1, with ResNet12 as its backbone, includes feature extraction layers Block11-41. Step S1 introduces a feature set enhancement mechanism to improve the discriminative ability of pedestrian image features in small samples. This mechanism first uses a multi-head self-attention module (MSM) to obtain more abundant feature information from samples at different scales. Then, the feature set Block2 obtained by the MSM is introduced into a spatial attention module (SAM) to recalibrate it in the spatial dimension, making the extracted pedestrian features more diverse and discriminative, thus compensating for the lack of pedestrian data. Secondly, in step S2, during the measurement of pedestrian features, the feature set Block3 is processed by a relational module... With dual metric module Different relational metric scores were obtained. and dual-measure score The joint metric score is then obtained through weighted fusion. Among them, the dual-measure module Using Euclidean distance as the cosine metric weight, a comprehensive measure of the spatial absolute distance and directional difference of pedestrian features is achieved, improving the reliability of pedestrian similarity measurement. Finally, in the meta-training process, the feature calculation based on dual similarity metrics enables the network model to learn two different similarity metrics simultaneously. The joint loss of the network is calculated based on the scores of the two metrics, which is used to adjust the network parameters in reverse, realizing meta-learning of the network structure. Ultimately, this achieves the construction of a few-sample pedestrian re-identification network based on feature set enhancement and metric fusion.
[0049] Step S1: Enhance pedestrian features
[0050] Combination Figure 2 Step S1 utilizes a multi-self-attention module (MSM) and a spatial attention module (SAM). The MSM explores matching feature sets across different feature extraction layers, while the SAM explores regions of interest within the spatial dimensions of each feature set. Finally, the feature sets enhanced by the SAM are concatenated in a Cat format across each dimension to obtain feature set Block3. Figure 3The structure diagram of the Multi-Head Self-Attention (MSM) module is presented. MSM infers pixel correlations from each subspace, obtaining feature maps with a global perspective. By stacking the resulting feature maps corresponding to different self-attention heads, the semantic representation is enriched. However, traditional MSM modules use fully connected layers to generate query matrices, key matrices, and value matrices. To address the insufficient pedestrian samples in small-sample environments, this invention employs a 3×3 depthwise separable convolution... Layers and 1×1 convolutions replace traditional self-attention fully connected layers as , and This reduces the number of computational parameters and prevents the model from overfitting.
[0051] The Multi-Head Self-Attention Module (MSM) is defined as follows:
[0052] (1)
[0053] Feature map extracted by the feature extraction layer Blockn1 Simultaneously through a 3×3 convolutional layer , layer and 1×1 convolutional layers Later obtained To prevent the internal volume from becoming too large, divide it by Then through tensor block function Later obtained and ,Will With transposed After multiplication function , and then with Multiplication yields a single self-attention feature map ,in Obtained through multiple identical operations. Then use cat to... splicing, through layer and max pooling layer Later obtained That is, the feature layer Blockn2.
[0054] Combination Figure 4 The structure diagram of the Spatial Attention Module (SAM) is given.
[0055] SAM can be defined as:
[0056] (2)
[0057] Feature set obtained by feature layer Blockn2 Simultaneously, after passing through the maximum pooling layer and average pooling layer Afterwards, The layers are concatenated along the channel dimension and then passed through a 7×7 convolutional layer. and function Generate the final spatial attention diagram .at last, The spatially weighted feature set Block3 is obtained.
[0058] In the multi-head self-attention module (MSM), the inner product operation between elements gives self-attention an inherently global receptive field. However, since the self-attention mechanism only calculates the correlation between features of different pixels without considering the positional information of each pixel, it lacks spatial awareness, resulting in a loss of structural information in the output. Therefore, spatial attention is introduced before fusing feature sets from different scale layers to recalibrate the feature sets generated by multi-head self-attention in the spatial dimension, thereby enhancing its spatial awareness.
[0059] Step S2: Measure pedestrian features
[0060] In the process of measuring pedestrian features, the features are processed through the relation module. With dual metric module Different metric scores are obtained, and then weighted and fused to obtain a joint metric score. This ensures that pedestrian features are simultaneously constrained by two different metric methods, effectively reducing feature similarity bias. The joint score formula is as follows:
[0061] (3)
[0062] in, Represents the joint metric score. Represents the score of the relational metric. Indicates a dual-measure score. The weighting coefficients are used for the joint score.
[0063] Combination Figure 5 The relation module is given. Structure diagram. Relationship measurement module. It consists of two convolutional blocks containing max-pooling layers and two fully connected layers. This relational metric module Finally, a similarity score is generated between the prototype of each class in the support set samples and the query set, which is the relationship measurement module. It is based on convolutional layers and The method for constructing a non-linear classifier using the relation measurement module. The formula for the metric score is as follows:
[0064] (4)
[0065] Pedestrian feature set obtained from feature set Block3 First, it undergoes two convolutions. and max pooling layer get That is, Block 31, Then through two fully connected layers , Activation function as well as function The nonlinear classifier is used to obtain relation metric scores. .
[0066] Combination Figure 6 A dual metric module is provided. The structural diagram of the dual-metric module. This is a metric fusion method that uses Euclidean distance as the cosine similarity metric weight. First, the support set samples and query set samples of the feature set Block3 are processed through two convolutional layers: the first convolutional layer contains a max-pooling layer, and the second convolutional layer contains an average pooling layer. Then, the resulting feature set Block32 is simultaneously input into the cosine similarity layer and the Euclidean distance metric layer to obtain the cosine similarity scores between the sample features. And European distance score Finally, the dual-measure score. The formula is:
[0067]
[0068]
[0069] (5)
[0070] in, This indicates that the feature set is obtained through feature set Block3. This represents the similarity score of the cosine module. Indicates cosine similarity layers. Indicates a convolutional layer. Indicates the max pooling layer. Indicates the average pooling layer. This represents the Euclidean distance score. This indicates the Euclidean distance metric layer. The score represents the dual-metric module score, calculated using the Euclidean distance between the two sets of images, since the support set samples do not contain images from the query set. Since it is not 0, the fraction in formula (5) holds true.
[0071] Step S3: Meta-training. Combined with... Figure 7A diagram of the meta-task framework is provided. Small sample problems with insufficient data are typically formalized as... The problem is that the model can correctly identify unlabeled pedestrians. The class that belongs to this pedestrian. The model completes learning within a meta-learning framework, that is, iteratively generating multiple meta-tasks. Training, validation, and testing are accomplished using meta-tasks. Taking the training set as an example, the experiment starts with the training label set... Random selection 1 different pedestrians, and from the training dataset Randomly select each pedestrian The image shows each pedestrian. The images are divided into: and Two sets of pictures, namely Zhang Image as Support Set , Zhang images as a query set This constitutes a meta-task. Similarly, meta-tasks are generated on the validation and test sets using the same method.
[0072] The meta-task construction based on the meta-learning framework uses a metric learning network composed of a feature extraction layer based on feature enhancement and a metric learning layer based on bisimilarity as its backbone. Multiple meta-tasks are generated cyclically, and the network is trained, validated, and tested using these meta-tasks. The goal is to train an improved neural network in the meta-tasks of the training set and learn transferable pedestrian re-identification knowledge. Secondly, the learned knowledge is used to adjust the hyperparameters of the meta-tasks of the validation set. Finally, the generalization accuracy is reported based on the average model accuracy in the meta-tasks of the test set.
[0073] The above three steps complete the pedestrian re-identification in a small sample environment.
[0074] Although the present invention has been disclosed above by way of embodiments, it is not intended to limit the present invention. Anyone skilled in the art can make some modifications and refinements without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be determined by the appended claims.
Claims
1. A pedestrian re-identification method in a small sample environment, characterized in that, Includes the following steps: Step S1 enhances the processing of pedestrian features. First, the multi-head self-attention module (MSM) obtains more sufficient feature information from samples at different scales. Then, the second feature set obtained by the multi-head self-attention module is introduced into the spatial attention module (SAM) to recalibrate it in the spatial dimension to obtain the third feature set. Step S2 measures pedestrian features. The third feature set is processed by a dual measurement module and a relation module to obtain a first measurement score and a second measurement score. These scores are then weighted and fused to obtain a joint measurement score, which constrains pedestrian features and reduces the similarity bias of pedestrian features. Among them, the dual metric module is a metric fusion method that uses Euclidean distance as the cosine metric weight; the relation metric module is a method that uses a non-linear classifier built with convolutional layers and a sigmoid function. Step S3, meta-learning, uses a metric learning network consisting of a feature extraction layer based on feature enhancement and a metric learning layer consisting of a dual metric module and a relation metric module as the backbone to generate multiple meta-tasks in a loop, learning transferable person re-identification knowledge; and uses the learned transferable person re-identification knowledge to implement a person re-identification method.
2. The pedestrian re-identification method in a small sample environment as described in claim 1, characterized in that, The multi-head self-attention module (MSM) is defined as follows: (1) Feature maps extracted by feature extraction networks Simultaneously through a 3×3 convolutional layer , layer and 1×1 convolutional layers Later obtained , It is a regulating factor, then processed by tensor block function. Later obtained and , To construct the matrix and To construct the value matrix, With transposed After multiplication function ,in q n To query the matrix, then... Multiplication yields a single self-attention feature map ,in Obtained through multiple identical operations Then use cat to... splicing, through layer and max pooling layer Later obtained That is, the second feature set.
3. The pedestrian re-identification method in a small sample environment as described in claim 2, characterized in that, The Spatial Attention Module (SAM) is defined as follows: (2) Feature set obtained from the second feature set Simultaneously, after passing through the maximum pooling layer and average pooling layer Afterwards, The layers are concatenated along the channel dimension and then passed through a 7×7 convolutional layer. and function Generate the final spatial attention diagram ,at last, We obtain the spatially weighted third feature set.
4. The pedestrian re-identification method in a small sample environment as described in claim 3, characterized in that, The second metric score of the relation metric module is: (4) Pedestrian feature set obtained from the third feature set First, it undergoes two convolutions. and max pooling layer get , Then through two fully connected layers , Activation function as well as function The nonlinear classifier obtained the second metric score of the relation metric module. .
5. The pedestrian re-identification method in a small sample environment as described in claim 4, characterized in that, The first metric score of the dual metric module The formula is: (5) in, This indicates that the feature set is obtained through the third feature set. This represents the similarity score of the cosine module. Indicates cosine similarity layers. Indicates a convolutional layer. Indicates the max pooling layer. Indicates the average pooling layer. This represents the Euclidean distance score. This indicates the Euclidean distance metric layer. This represents the first metric score of the dual-metric module.