Gait recognition method, electronic device, and computer-readable storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2022-12-28
- Publication Date
- 2026-06-26
AI Technical Summary
Existing gait recognition models are limited by hardware computing power and inference speed, making it impossible to acquire gait information over a large time range and adapt to scenarios with varying walking speeds, resulting in decreased recognition efficiency and accuracy.
A gait sequence training method combining equal-interval sampling and random-interval sampling is adopted. The gait recognition model is trained by comparing gait sequences sampled at equal intervals and gait sequences sampled at random intervals. The computational advantages of multiple algorithms are combined to improve the recognition effect.
It improves the recognition efficiency and accuracy of gait recognition models in real-world scenarios, adapts to scenarios with varying walking speeds, and meets the needs for accurate recognition.
Smart Images

Figure CN116092186B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing, and in particular to gait recognition methods, electronic devices, and computer-readable storage media. Background Technology
[0002] With the continuous development of computer vision technology, gait recognition has gradually become an important part of the field of biometrics. Gait recognition is a technology that uses the body shape and posture of a person walking to identify the identity of a pedestrian. It has the characteristics of individual uniqueness, difficulty in spoofing, long-distance recognition, and no need for the cooperation of the subject. It can play an important role in difficult scenarios such as face occlusion, long-distance recognition, and clothing changes.
[0003] Gait profiles are grayscale binary images obtained by semantic segmentation of raw images of pedestrians walking, followed by binarization of the human and background regions of the segmented images. A gait profile sequence is a sequence of multiple frames of gait profiles arranged in chronological order. In existing technologies, a preset depth model is typically trained using a sequence of gait profiles corresponding to consecutive frames or equally spaced sampled images to obtain a gait recognition model, which is then used to identify and match the gait profile sequence of pedestrians.
[0004] However, there is a large amount of similar and / or repetitive information between consecutive frames. When a model trained using multiple consecutive frames is embedded in hardware, it affects hardware computing power and inference speed, making it impossible to acquire gait information over a larger time range, resulting in decreased recognition efficiency. Furthermore, equally spaced sampled images cannot encompass scenarios where walking speed changes; models trained using equally spaced sampled images cannot well fit scenarios with varying walking speeds, leading to decreased recognition accuracy. Therefore, none of the above models can achieve good gait recognition results in real-world scenarios and struggle to meet the requirements for accurate gait recognition. Summary of the Invention
[0005] The main technical problem addressed by this application is to provide a gait recognition method, electronic device, and computer-readable storage medium that can solve the problem that existing technologies cannot meet the gait recognition needs of real-world scenarios.
[0006] To address the aforementioned technical problems, the first technical solution adopted in this application is to provide a gait recognition method, comprising: acquiring a gait sequence of at least one target object; inputting the gait sequence into a gait recognition model, and using the gait recognition model to extract the gait features of the target object from the gait sequence; wherein the gait recognition model is obtained by comparative training using equally spaced sampled gait sequences and randomly spaced sampled gait sequences; wherein the equally spaced sampled gait sequences include continuous frame gait sequences; using the gait recognition model to match and recognize the gait features with the stored gait features of the user, and outputting the recognition result.
[0007] To solve the above-mentioned technical problems, the second technical solution adopted in this application is to provide an electronic device, including: a memory for storing program data, wherein the program data, when executed, implements the steps in the gait recognition method described above; and a processor for executing the program data stored in the memory to implement the steps in the gait recognition method described above.
[0008] To solve the above-mentioned technical problems, the third technical solution adopted in this application is to provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements the steps in the gait recognition method described above.
[0009] The beneficial effects of this application are as follows: Unlike existing technologies, this application provides a gait recognition method, electronic device, and computer-readable storage medium. It extracts and recognizes gait features from the acquired gait sequence of the target object through a gait recognition model. The gait recognition model is obtained by comparative training using equally spaced sampled gait sequences (including continuous frame gait sequences) and randomly spaced sampled gait sequences. It can combine the computational advantages of multiple gait recognition algorithms, which can solve the problem of decreased algorithm recognition efficiency caused by training only with continuous frame gait sequences, and the problem of decreased algorithm recognition accuracy caused by training only with equally spaced sampled sequences. This improves the gait recognition effect of the gait recognition model in real-world scenarios and meets the need for accurate gait recognition. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1 This is a flowchart illustrating one implementation method of the training method for the gait recognition model of this application;
[0012] Figure 2 This is a flowchart illustrating a specific implementation method of S11;
[0013] Figure 3 This is a flowchart illustrating a specific implementation method (S13).
[0014] Figure 4 This is a flowchart illustrating one embodiment of the gait recognition method of this application;
[0015] Figure 5 This is a flowchart illustrating a specific implementation of S41;
[0016] Figure 6 This is a schematic diagram of one embodiment of the gait recognition device of this application;
[0017] Figure 7 This is a schematic diagram of the structure of one embodiment of the electronic device of this application;
[0018] Figure 8 This is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0020] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. The singular forms “a,” “said,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms, unless otherwise clearly indicated above. “Multiple” generally includes at least two, but does not exclude the inclusion of at least one.
[0021] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.
[0022] It should be understood that the terms "comprising," "including," or any other variations used herein are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0023] This application first provides a training method for a gait recognition model.
[0024] Please see Figure 1 , Figure 1This is a flowchart illustrating one embodiment of the training method for the gait recognition model of this application. In this embodiment, the gait recognition model is obtained by comparing and training gait sequences sampled at equal intervals and gait sequences sampled at random intervals. The training method specifically includes:
[0025] S11: Obtain multiple first training sets and multiple second training sets based on multiple labeled image sequences; wherein each first training set includes multiple equally spaced sampling gait sequences with the same sampling interval, and each second training set includes multiple randomly spaced sampling gait sequences with the same random frame skipping interval; wherein the sampling intervals corresponding to the multiple first training sets are different, and the random frame skipping intervals corresponding to the multiple second training sets are different.
[0026] Specifically, please refer to Figure 2 , Figure 2 This is a flowchart illustrating a specific implementation of S11. In this implementation, the step of obtaining multiple first training sets and multiple second training sets based on multiple labeled image sequences specifically includes:
[0027] S111: Select a video image and obtain the image sequence corresponding to multiple pedestrians and the annotation information of each image sequence.
[0028] In this embodiment, video images of pedestrians walking in a natural state are first collected by monitoring equipment. Then, multiple pedestrians in the video images are preprocessed, such as detection and tracking. After obtaining the image sequence corresponding to each pedestrian, the image sequence is optimized to obtain a clear and complete unobstructed image sequence of walking. Each frame in the image sequence is aligned so that the human body outline corresponding to the pedestrian is located in the center of the image sequence.
[0029] Specifically, video images typically include multiple pedestrians. When detecting a video image, multiple pedestrians are usually detected. When at least one pedestrian is detected and identified as the target object, tracking of the target object begins in subsequent frames of the video image to obtain an image sequence of the target object during the continuous tracking process.
[0030] S112: Each image sequence is sampled at equal intervals using multiple preset sampling intervals to obtain multiple equally spaced sampled image sequences with different sampling intervals based on each image sequence; wherein, the sampling interval includes 0.
[0031] In this embodiment, multiple different values are taken from a preset set of integers to obtain multiple sampling intervals with different values. The maximum number of sampling intervals is equal to the number of integers included in the preset set of integers.
[0032] In this embodiment, the preset integer set is R∈[0,4]
[0033] Specifically, a maximum of 5 different values can be selected from the preset integer set to obtain sampling intervals of 0, 1, 2, 3, and 4, respectively, which correspond to 0, 1, 2, 3, and 4 interval frames.
[0034] Understandably, when the value is 0, the corresponding sampling interval is 0, that is, the number of interval frames is 0, and the obtained sequence is a continuous frame sequence.
[0035] Understandably, within the preset integer set, larger values result in larger frame intervals, a wider temporal range of acquired image information, and more redundant information. However, if the value exceeds the maximum of 4, the acquired sequence contains less information, leading to poor robustness of the trained model.
[0036] In this embodiment, each image sequence is sampled at equal intervals using each sampling interval, and the sampled images are arranged in their corresponding time order to obtain the corresponding equally spaced sampled image sequence.
[0037] Since the number of frames in equal-interval sampling is fixed, equal-interval sampling is also called static frame rate sampling, and equal-interval sampled image sequences are also called static frame rate sequences.
[0038] In this embodiment, each image sequence is sampled at equal intervals using k sampling intervals, resulting in k equally spaced sampled image sequences with different sampling intervals based on each image sequence. The value of k can be any integer from 1 to 5, thus constructing 1 to 5 equally spaced sampled image sequences; this application does not limit the value of k.
[0039] In this embodiment, the set of static frame rate sequences consisting of multiple equally spaced sampled image sequences corresponding to each image sequence can be represented as S = {s1, ..., s2}. k}, where k≥1, k represents the number of sampling intervals obtained from a preset set of integers, s1 represents the first equally spaced sampled image sequence in the set of static frame rate sequences, s k This represents the k-th equally spaced sampled image sequence in the set of static frame rate sequences.
[0040] Each equally spaced sampled image sequence can be represented as s i ={p1, p 1+(1+ri) p 1+2*(1+ri) , ..., p j , ...}, where s i Let r represent the i-th equally spaced sampled image sequence in the set of static frame rate sequences. iThe sampling interval is represented by n, the total number of frames in the sequence is n, and p1 represents the first frame of the image sequence. 1+(1+ri) Represents the [1+(1+r]th image in the image sequence i ] Frame image, p 1+2*(1+ri) Represents the [1+2*(1+r]th image in the image sequence i )] Frame image, p j This represents the j-th frame image in the image sequence.
[0041] In a specific implementation scenario, the sampling interval r i If s is 0, then s i ={p1, p2, p3, ..., p j , ..., p n This means that the sequence is a continuous frame sequence.
[0042] In another specific implementation scenario, the sampling interval r i If s is 1, then s i ={p1, p3, p5, ..., p j , ..., p n}, meaning that the interval between two adjacent frames in this sequence is 1 frame.
[0043] In yet another specific implementation scenario, the sampling interval r i If it is 3, then s i ={p1, p5, p9, ..., p j , ..., p n}, meaning that the interval between two adjacent frames in this sequence is 3 frames.
[0044] S113: Use multiple preset sets of random frame skipping intervals to perform random interval sampling on each image sequence, so as to obtain multiple random interval sampled image sequences with different random frame skipping intervals based on each image sequence.
[0045] In this embodiment, since the number of interval frames in random interval sampling is randomly selected, that is, the number of interval frames between two adjacent frames changes dynamically, random interval sampling is also called dynamic frame rate sampling, and the random interval sampled image sequence is also called a dynamic frame rate sequence.
[0046] In this embodiment, in response to a random interval sampling image sequence having a preset number of image frames, a preset number of random values are selected from a preset set of integers to obtain a set of random skipping frame intervals having a preset number of random values.
[0047] The preset integer set is R∈[0,4].
[0048] Specifically, if the number of frames in the input sequence required by the model is H, then each dynamic frame rate sequence needs to be obtained by randomly taking H values on R (the first frame also needs to be skipped) to obtain H sampling intervals. The image sequence is sampled based on the H sampling intervals, and the sampled multi-frame images are arranged in chronological order to obtain a dynamic frame rate sequence with H frames.
[0049] Understandably, randomly selecting values from multiple integers within a preset set of integers can control the dynamic sampling interval within a certain range. That is, the sampling interval can only change randomly within the preset range, which can avoid the loss of timing information caused by excessively large sampling intervals, thereby avoiding errors.
[0050] Furthermore, repeat the above steps of randomly selecting values to obtain multiple sets of random frame skipping intervals, and use the obtained multiple sets of random frame skipping intervals to perform random interval sampling on each image sequence to obtain multiple random interval sampled image sequences with different random frame skipping intervals based on each image sequence.
[0051] Understandably, by sampling each image sequence at random intervals using a preset set of random frame skipping intervals, it can be ensured that the multiple randomly sampled image sequences obtained by sampling multiple image sequences based on the same set of random frame skipping intervals have the same random frame skipping interval.
[0052] In this embodiment, each image sequence is sampled at equal intervals using m random frame skipping interval sets, resulting in m randomly spaced sampled image sequences with different random frame skipping intervals for each image sequence. The value of m can be any integer greater than or equal to 1, meaning at least one randomly spaced sampled image sequence is constructed; this application does not limit this.
[0053] In this embodiment, the dynamic frame rate sequence set consisting of multiple randomly spaced sampled image sequences corresponding to each image sequence can be represented as D = {d1, ..., d2}. m}, where m≥1, m represents the number of random frame skipping interval sets constructed, d1 represents the first random interval sampled image sequence in the dynamic frame rate sequence set, d m This represents the m-th randomly sampled image sequence in the dynamic frame rate sequence set.
[0054] In this embodiment, the set of random frame skipping intervals can be represented as T = {i1, i2, ..., iH}, where i1 represents the first frame skipping interval, i2 represents the second frame skipping interval, and iH represents the Hth frame skipping interval (i.e., the last frame skipping interval). The value of i1 can only be 0, in order to obtain the first frame sequence in the image sequence.
[0055] In this embodiment, the random interval sampling image sequence sampled using a random frame skipping interval set can be represented as d. i ={d 1+i1 d 1+i1+i2 , ..., d 1+i1+i2+…+iH}, where d i Let d1 represent the i-th randomly sampled image sequence in the dynamic frame rate sequence set, and d2 represent the first frame image in the image sequence. 1+i1+i2 Let d represent the (1+i1+i2)th frame image in the image sequence. 1+i1+i2+…+iH This represents the (1+i1+i2+…iH)th frame image in the image sequence.
[0056] S114: Obtain the corresponding equally spaced sampling gait sequence based on each equally spaced sampling image sequence, and obtain the corresponding random spaced sampling gait sequence based on each random spaced sampling image sequence.
[0057] In this embodiment, after obtaining k equally spaced sampled image sequences with different sampling intervals and m randomly spaced sampled image sequences with different random frame skipping intervals for each image sequence, a semantic segmentation algorithm is used to perform semantic segmentation on multiple frames of images in each sequence in chronological order. After obtaining the human shape region mask and background region corresponding to the pedestrian based on the segmented images, the human shape region mask and background region are binarized to obtain grayscale binary images. Then, based on the multiple frames of grayscale binary images, the corresponding pedestrian gait contour grayscale binary image sequence is obtained, thereby obtaining multiple equally spaced sampled gait sequences and multiple randomly spaced sampled gait sequences.
[0058] Binarization refers to setting the grayscale value of pixels in an image to 0 or 255, thus presenting the entire image with only black and white visual effects. Binarization simplifies the image, reduces data size, and highlights the outline of the target of interest. In this embodiment, binarizing the human figure mask yields a clearer human outline.
[0059] Specifically, the grayscale value of the human-shaped area mask is set to 255, and the grayscale value of the background area is set to 0.
[0060] S115: Extract equally spaced sampling gait sequences with the same sampling interval from multiple equally spaced sampling gait sequences corresponding to each image sequence, and divide the multiple equally spaced sampling gait sequences with the same sampling interval into the same set to obtain multiple first training sets; wherein, the number of first training sets is the same as the number of preset sampling intervals.
[0061] In a specific implementation scenario, in response to the extraction of three static frame rate sequence sets S from the image sequences corresponding to three pedestrians, each static frame rate sequence set S includes five equally spaced sampled image sequences with different sampling intervals (sampling intervals of 0, 1, 2, 3, and 4, respectively), and the number of frames in each equally spaced sampled image sequence is H, i.e., S = {s1, s2, s3, s4, s5}, where s1 represents an equally spaced sampled image sequence with an interval of 0 frames (a continuous frame image sequence), s2 represents an equally spaced sampled image sequence with an interval of 1 frames, s3 represents an equally spaced sampled image sequence with an interval of 2 frames, s4 represents an equally spaced sampled image sequence with an interval of 3 frames, and s5 represents an equally spaced sampled image sequence with an interval of 4 frames.
[0062] Based on five equally spaced sampled image sequences in each static frame rate sequence set S, five equally spaced sampled gait sequences are obtained. The set of equally spaced sampled gait sequences can be represented as S. B ={s B1 s B2 s B3 s B4 s B5}, where s B1 This represents an equally spaced gait sequence (continuous frame gait sequence) with an interval of 0 frames. B2 s represents an equally spaced gait sequence with an interval of 1 frame. B3 This represents an equally spaced gait sequence with an interval of 2 frames, s B4 This represents a gait sequence sampled at equal intervals of 3 frames, s B5 This represents an equally spaced gait sequence with a frame interval of 4.
[0063] Furthermore, gait sequence sets S were sampled from three equally spaced sets. B Extract the equally spaced gait sequence s with an interval of 0 frames. B1 and 3 s B1 Divide them into the same set to obtain a first training set. Repeat the above steps until all three sets are in the same set. B2 3 s B3 3 s B4 And 3 s B4 They are divided into the same set, resulting in a total of 5 first training sets.
[0064] Understandably, the number of equally spaced gait sequences included in the five first training sets obtained in the above manner is the same (3 for each), and the number of equally spaced gait sequences obtained in each first training set based on the image sequence corresponding to each pedestrian is also the same (1 for each), and the length of the equally spaced gait sequences in each first training set is also the same (H frames for each).
[0065] Understandably, the number of samples in the first training set is equal to the number of sampling intervals obtained from the preset set of integers.
[0066] S116: Extract random interval sampling gait sequences with the same random frame skipping interval from multiple random interval sampling gait sequences corresponding to each image sequence, and divide multiple random interval sampling gait sequences with the same random frame skipping interval into the same set to obtain multiple second training sets; wherein, the number of second training sets is the same as the number of preset random frame skipping interval sets.
[0067] In the specific implementation scenario described above, in response to the extraction of three dynamic frame rate sequence sets D from the image sequences corresponding to the three pedestrians, each dynamic frame rate sequence set D includes four random interval sampling image sequences with different random frame skipping intervals (sampled using four random frame skipping interval sets T1, T2, T3, and T4 respectively), and the number of frames in each random interval sampling image sequence is H, i.e., D = {d1, d2, d3, d4}, where d1 represents the random interval sampling image sequence obtained by sampling using T1, d2 represents the random interval sampling image sequence obtained by sampling using T2, d3 represents the random interval sampling image sequence obtained by sampling using T3, and d4 represents the random interval sampling image sequence obtained by sampling using T4.
[0068] Based on the four randomly spaced sampled image sequences in each dynamic frame rate sequence set D, four corresponding randomly spaced sampled gait sequences are obtained. The set of randomly spaced sampled gait sequences can be represented as D. B ={d B1 d B2 d B3 d B4}, where d B1 d represents the gait sequence obtained by sampling at random intervals using T1. B2 d represents the gait sequence obtained by sampling at random intervals using T2. B3 d represents the randomly spaced gait sequence obtained by sampling using T3. B4 This represents a random interval sampling gait sequence obtained by sampling using T4.
[0069] Furthermore, gait sequence sets D were sampled from three random intervals. B Extract the random interval sampling gait sequence d obtained by sampling using T1. B1 and 3 d B1 Divide them into the same set to obtain a second training set. Repeat the above steps until all three sets are divided into three groups. B2 3 d B3And 3 d B4 They are divided into the same set, resulting in a total of 4 second training sets.
[0070] Understandably, the number of randomly spaced gait sequences included in the four second training sets obtained in the above manner is the same (3 for each), and the number of randomly spaced gait sequences obtained in each second training set based on the image sequence corresponding to each pedestrian is also the same (1 for each), and the length of the randomly spaced gait sequences in each second training set is also the same (H frames for each).
[0071] Understandably, the number of sets in the second training set is equal to the number of sets of preset random frame skipping intervals.
[0072] S12: Use one of the first training sets to train the main model for gait recognition, and use the remaining multiple first training sets and multiple second training sets to train multiple auxiliary models for gait recognition; wherein, the input sequence of the main model and multiple auxiliary models is a gait sequence obtained based on the same labeled image sequence.
[0073] In this embodiment, multiple deep learning models with the same network structure are first obtained, and different initialization functions are set for each deep learning model to generate different initialization parameters, thereby obtaining multiple recognition models with different initialization parameters. Further, one of the recognition models is used as the main model, and the remaining multiple recognition models are used as multiple auxiliary models.
[0074] Taking the specific implementation scenario described above as an example, in response to obtaining 5 first training sets and 4 second training sets based on the image sequences corresponding to 3 pedestrians, 9 deep learning models with the same network structure are obtained. Different initialization functions are set for each deep learning model to generate different initialization parameters, resulting in 9 recognition models (N1 to N9) with different initialization parameters. Further, one of the recognition models, N1, is used as the main model, and the remaining 8 recognition models (N2 to N9) are used as auxiliary models.
[0075] In this embodiment, an equally spaced gait sequence from one of the first training sets is input into the main model to obtain the first gait feature. Simultaneously, equally spaced and randomly spaced gait sequences obtained from the same labeled image sequence from the remaining multiple first training sets and multiple second training sets are input into multiple auxiliary models to obtain multiple second gait features.
[0076] Specifically, the first step phase feature is used as the main feature information, and multiple second step phase features are used as auxiliary feature information.
[0077] In this embodiment, the input sequence input to the main model and multiple auxiliary models in each round is a gait sequence obtained based on the same labeled image sequence. That is, the training data input to the main model and auxiliary models in each round is a gait sequence obtained by sampling the same scene and the same subject in the same labeled image sequence at different frame rates.
[0078] Taking the specific implementation scenario described above as an example, in the first round, gait sequences s with an interval of 0 frames from one of the first training sets are sampled at equal intervals. B1 The input is fed into the main model N1 for training, while the remaining gait sequences s, obtained from the same labeled image sequence with equal intervals of 1 frame, are also used. B2 s is a gait sequence sampled at equal intervals of 2 frames. B3 s is a gait sequence sampled at equal intervals of 3 frames. B4 And gait sequences s with equal intervals of 4 frames. B5 The inputs are respectively fed into auxiliary models N2 to N5 for training, and random interval sampling gait sequences d obtained from multiple second training sets based on the same labeled image sequence are sampled using T1. B1 The random interval sampling gait sequence d obtained by sampling using T2. B2 The random interval sampling gait sequence d obtained by sampling using T3. B3 And the random interval sampling gait sequence d obtained by sampling using T4. B4 The input is fed into auxiliary models N6 to N9 for training.
[0079] S13: Use the gait loss function generated during each iteration of the main model to back-train the model parameters; and use the updated model parameters of the main model and the similarity loss function between the main model and each auxiliary model to update the model parameters of each auxiliary model.
[0080] Specifically, please refer to Figure 3 , Figure 3 This is a flowchart illustrating a specific implementation method for S13. In this implementation, the steps of back-training the model parameters of the main model using the gait loss function generated during each iteration of the main model training, and updating the model parameters of each auxiliary model using the updated model parameters of the main model and the similarity loss function between the main model and each auxiliary model, specifically include:
[0081] S131: Calculate the gait loss function between the first gait features and the labeled information.
[0082] In this embodiment, the gait loss function between the first step features and the corresponding annotation information of the graph is calculated using the triple loss function and the cross-entropy loss function (CEloss).
[0083] Among them, the triplet loss function is mainly calculated using the metric features of the main feature information, while the cross-entropy loss function is mainly calculated using the classification features of the main feature information.
[0084] S132: Using the backpropagation algorithm, the main model parameters of the main model are updated based on the gait loss function to obtain the updated main model parameters.
[0085] In this implementation, the main model parameters are updated after optimizing the sum of all losses using the backpropagation algorithm.
[0086] S133: Calculate the similarity loss function between the first step feature and each second step feature.
[0087] In this embodiment, cosine similarity is used to calculate the similarity loss function between the first step feature and each second step feature, so as to obtain a set of similarity loss functions including multiple similarity loss functions.
[0088] Specifically, the set of similarity loss functions is represented as L = {L2, L3, ..., L...} k+m}, where k represents the number of elements in the first training set, m represents the number of elements in the second training set, L2 represents the similarity loss function between the first step feature and the first second step feature, and L2 represents the similarity loss function between the first step feature and the second second step feature. (k+m)-1 This represents the similarity loss function between the first step feature and the (k+m)th (i.e., the last) second step feature.
[0089] In other embodiments, any one of the following similarity algorithms can be used to calculate the similarity loss function between the first step feature and each second step feature: Pearson Correlation Coefficient, Kullback-Leibler Divergence, Jaccard Coefficient, Tanimoto Coefficient (Generalized Jaccard Coefficient), and Mutual Information. This application does not limit this method.
[0090] S134: Update the auxiliary model parameters corresponding to each auxiliary model using the updated main model parameters and each similarity loss function to obtain multiple updated auxiliary model parameters.
[0091] In this implementation, after updating the main model parameters, it is necessary to update the auxiliary model parameters corresponding to each auxiliary model.
[0092] Taking the specific implementation scenario above as an example, let the main model parameter of the main model N1 be ω1, and the auxiliary model parameter of the auxiliary model N2 be ω2. Then, the formula for updating the auxiliary model parameter using the main model parameter is as follows:
[0093] ω2 * = (α×ω1+β×ω2) / (α+β)
[0094] α = γ × softmax(L2)
[0095] β=(1-γ)×(1-softmax(L2))
[0096]
[0097] Where, ω2 * Here, α represents the updated auxiliary model parameters after the auxiliary model N2 is applied, β represents the weight coefficient of the main model N1, γ represents the weight coefficient of the auxiliary model N2, and γ is a control factor used to control the ratio of α to β, with a value of (0.8, 1). L2 represents the similarity loss function between the first step features output by the main model N1 and the second step features output by the auxiliary model N2. i The first step state features output by the main model N1 and the auxiliary model N i The similarity loss function between the output second gait features, softmax(L2) is the normalization function of the auxiliary model N2.
[0098] Among them, the remaining auxiliary models N3 to N i The update method is the same as that of the auxiliary model N2.
[0099] In this embodiment, the smaller the difference between the parameters of an auxiliary model and the parameters of the main model in response to a certain auxiliary model, the more similar the output second gait feature is to the output first gait feature of the main model. The smaller the similarity loss function between the two, the larger the α generated based on this similarity loss function, and the smaller the update magnitude of the auxiliary model. Conversely, the greater the difference between the parameters of an auxiliary model and the parameters of the main model in response to a certain auxiliary model, the larger the update magnitude of the auxiliary model.
[0100] In the existing field of contrastive learning, fixed weight coefficients are usually used to update the network parameters corresponding to the auxiliary model, but this approach cannot adapt well to changes during the training process.
[0101] Unlike existing technologies, this implementation method can dynamically adjust the network parameters corresponding to the auxiliary model through the above formula, which can better adapt to changes in the training process. Moreover, regardless of the number of auxiliary models, this formula can update the network weight parameters well.
[0102] S14: Repeat the above input and reverse update steps until the set number of iterations is reached, and use the trained master model as the gait recognition model.
[0103] In this embodiment, gait sequences sampled at equal intervals from the same first training set are used as input to the main model, and sequences from the remaining multiple first training sets and multiple second training sets are used as input to multiple auxiliary models. The process is iterated until the training results of the main model and auxiliary models meet the preset convergence conditions, thus obtaining the trained main model, which is then used as the gait recognition model.
[0104] Understandably, this implementation uses gait sequences at different frame rates corresponding to the same image sequence as input sequences for the main model and auxiliary models, obtains the similarity loss function between the auxiliary feature information output by each auxiliary model and the main feature information output by the main model, and uses multiple similarity loss functions and the gait loss function of the main model as the total loss function to update the main model parameters of the main model, and uses the updated main model parameters to update the auxiliary model parameters of the auxiliary models, which can improve the stability of the main model by using contrastive learning.
[0105] Furthermore, the training process in this embodiment is end-to-end, and the multiple auxiliary models in the training process do not participate in the actual gait recognition. Only the trained master model needs to be deployed in the hardware as the gait recognition model. The hardware requirements are low, and it can be applied to more monitoring devices or playback devices.
[0106] Unlike existing technologies, this implementation uses gait sequences with combined frame rates to train multiple recognition models. This allows for training with continuous frame sequences even when the selected gait sequence is short, better preserving the information of the original gait sequence and thus enhancing the model's recognition accuracy. Furthermore, with limited hardware resources, compared to training solely with continuous frame sequences, this implementation uses interval-based static frame rate sequences for training, acquiring gait information over a wider time range and improving the model's recognition efficiency. Moreover, when a pedestrian's walking speed fluctuates, training with dynamic frame-skipping sequences within a preset range can better fit scenarios of speed changes, thereby enhancing the model's robustness.
[0107] Understandably, this implementation can combine the computational advantages of multiple gait recognition algorithms, which can solve the problem of decreased algorithm recognition efficiency caused by training with only continuous frame gait sequences, and the problem of decreased algorithm recognition accuracy caused by training with only equally spaced sampling sequences, thus making it more suitable for real-world scenarios.
[0108] Please see Figure 4 , Figure 4 This is a flowchart illustrating one embodiment of the gait recognition method of this application. In this embodiment, the gait recognition method is implemented using the aforementioned gait recognition model, and the gait recognition method includes:
[0109] S41: Obtain the gait sequence of at least one target object.
[0110] Specifically, please refer to Figure 5 , Figure 5 This is a flowchart illustrating a specific implementation of S41. In this implementation, the step of obtaining the gait sequence of at least one target object specifically includes:
[0111] S411: Acquire video images based on the monitored area.
[0112] In this embodiment, the monitoring area can be an indoor area or an outdoor area. For example, the indoor area can be a part of the office space near the entrance, and the outdoor area can be the entrance of a residential community or a school.
[0113] The monitoring equipment can be a network camera (IPC) or other video surveillance cameras.
[0114] S412: Detect video images to obtain an image sequence of at least one target object.
[0115] In this embodiment, firstly, multiple consecutive frames of images are obtained based on video images, and then the multiple consecutive frames of images are detected to obtain at least one human body detection box of the target object.
[0116] Specifically, after detecting at least one target object from a video image using a pedestrian detection algorithm, a human body segmentation model is used to add human body detection boxes to the target object in multiple consecutive frames of images.
[0117] In a specific implementation scenario, pedestrian detection algorithms can adopt motion detection-based target tracking algorithms. That is, when the camera is stationary, background modeling algorithms are used to extract moving foreground targets, and then a classifier is used to classify the moving targets and determine whether they include pedestrians. Examples include Gaussian mixture model algorithms, frame difference algorithms, or sample consistency modeling algorithms.
[0118] In another specific implementation scenario, a machine learning-based pedestrian detection algorithm can be used. This algorithm utilizes the appearance features of the human body (such as color, edge, and texture features) to train a classifier and distinguish pedestrians from the background. Specifically, algorithms based on HOG (Histogram of Oriented Gradient) + SVM (Support Vector Machine), HOG + AdaBoost (Adaptive Boosting), and DPM (Deformable Parts Model) + LatentSVM can be used.
[0119] In another specific implementation scenario, a pedestrian detection algorithm based on deep learning can be used. This algorithm trains a classifier based on human features learned through deep learning to distinguish pedestrians from the background, and has strong robustness. Examples include algorithms based on Cascade CNN and algorithms based on JointDeep, etc. This application does not limit the specific algorithms used.
[0120] Furthermore, the human detection box corresponding to each target object is tracked in real time to establish tracking identity (ID) information for each target object, and the image sequence of each target object in consecutive multi-frame images is determined based on the tracking identity information.
[0121] Specifically, when there are multiple target objects in a video image, the corresponding target objects are tracked in consecutive frames based on different tracking identity information, so as to obtain the image sequence of each target object in the continuous tracking process.
[0122] S413: Obtain the human-shaped region of the target object based on the image sequence, and use the human-shaped region to obtain the gait sequence of the target object.
[0123] In this embodiment, a human body segmentation model is used to segment the human body detection box in the image sequence to obtain the background region and the human body region. Then, the human body region mask of the target object is obtained using the background region and the human body region. The human body region mask is binarized to obtain the gait sequence of the target object.
[0124] The human detection bounding box includes a large background area, which can affect the detection of the human area. Therefore, a mask is needed to separate the human area from the background area to obtain an effective human area mask.
[0125] S42: Input the gait sequence into the gait recognition model and use the gait recognition model to extract the gait features of the target object from the gait sequence; wherein, the gait recognition model is obtained by comparative training using equally spaced sampled gait sequences and randomly spaced sampled gait sequences; wherein, the equally spaced sampled gait sequences include continuous frame gait sequences.
[0126] In this embodiment, the gait recognition model is a master model trained using contrastive learning, which has good recognition accuracy, recognition efficiency and robustness, and has low hardware requirements, and can be directly deployed at the front end or back end of the monitoring equipment.
[0127] In this embodiment, gait features include static features and dynamic features. Static features refer to the physiological characteristics of the target object, such as height, body shape, leg bones, joints, and muscles, obtained based on the human body detection box. Dynamic features refer to the activity characteristics of the target object, such as arm swing, head swaying, body swaying, and step frequency, which reflect the walking habits of the target object during the foot landing, foot lifting, and support swing phases.
[0128] Understandably, since different pedestrians have different physiological characteristics and walking habits, gait recognition models can be used to extract and identify gait features to obtain the identity features of the target object.
[0129] S43: Use a gait recognition model to match and identify gait features with stored user gait features, and output the recognition results.
[0130] In this embodiment, the extracted gait features are compared with the user's gait features stored in the feature library using a gait recognition model. By determining whether the two reach the set similarity threshold, it is determined whether the target object is a user.
[0131] In a specific implementation scenario, if the matching degree between the extracted gait features and the gait features of a certain user in the feature library reaches the set similarity threshold, it indicates that the gait features of the target object and the gait features of that user are the same person's gait features, and the gait recognition model outputs the recognition result that the target object is the user.
[0132] In another specific implementation scenario, if the matching degree between the extracted gait features and the gait features of any user stored in the feature library does not reach the set similarity threshold, it indicates that there are no user gait features in the feature library that match the gait features of the target object, and the gait recognition model outputs the recognition result that the target object is not a user.
[0133] In this embodiment, if the monitored area is an office space, the gait features of users stored in the feature database can be based on gait features extracted from employees. By using a gait recognition model to identify the gait features of pedestrians appearing in the monitored area, it can be determined whether the pedestrians are employees. If the monitored area is a residential community entrance, the gait features of users stored in the feature database can be based on gait features extracted from residents. By using a gait recognition model to identify the gait features of pedestrians appearing in the monitored area, it can be determined whether the pedestrians are residents. If the monitored area is a school entrance, the gait features of users stored in the feature database can be based on gait features extracted from students or teachers. By using a gait recognition model to identify the gait features of pedestrians appearing in the monitored area, it can be determined whether the pedestrians are students or teachers.
[0134] Unlike existing technologies, this implementation method extracts and identifies gait features from the acquired gait sequence of the target object using a gait recognition model. The gait recognition model is obtained by comparative training using equally spaced sampled gait sequences (including continuous frame gait sequences) and randomly spaced sampled gait sequences. This combines the computational advantages of multiple gait recognition algorithms, solving both the problem of decreased algorithm recognition efficiency caused by training with only continuous frame gait sequences and the problem of decreased algorithm recognition accuracy caused by training with only equally spaced sampled sequences. This improves the gait recognition effect of the gait recognition model in real-world scenarios, thereby meeting the need for accurate gait recognition.
[0135] Correspondingly, this application provides a gait recognition device.
[0136] Please see Figure 6 , Figure 6 This is a schematic diagram of one embodiment of the gait recognition device of this application. Figure 6 As shown, the gait recognition device 60 includes a gait sequence acquisition module 61, a gait feature extraction module 62, and a gait recognition module 63.
[0137] Gait sequence acquisition module 61 is used to acquire the gait sequence of at least one target object.
[0138] The gait feature extraction module 62 is used to input the gait sequence into the gait recognition model and use the gait recognition model to extract the gait features of the target object from the gait sequence; wherein, the gait recognition model is obtained by comparative training using equally spaced sampled gait sequences and randomly spaced sampled gait sequences; wherein, the equally spaced sampled gait sequences include continuous frame gait sequences.
[0139] The gait recognition module 63 is used to match and recognize gait features with stored user gait features using a gait recognition model, and output the recognition results.
[0140] For details of the process, please refer to the relevant textual descriptions in S11~S13, S111~S116, S131~S134, S41~S43 and S411~S413, which will not be repeated here.
[0141] Unlike existing technologies, this embodiment extracts and recognizes gait features from the acquired target object's gait sequence using a gait feature extraction module 62 and a gait recognition module 63. The gait recognition model used is obtained by comparing and training gait sequences sampled at equal intervals (including continuous frame gait sequences) and gait sequences sampled at random intervals. This combines the computational advantages of multiple gait recognition algorithms, solving both the problem of decreased algorithm recognition efficiency caused by training with only continuous frame gait sequences and the problem of decreased algorithm recognition accuracy caused by training with only equal interval sample sequences. This improves the gait recognition effect of the gait recognition model in real-world scenarios, thereby meeting the need for accurate gait recognition.
[0142] Correspondingly, this application provides an electronic device.
[0143] Please see Figure 7 , Figure 7 This is a schematic diagram of one embodiment of the electronic device of this application. For example... Figure 7 As shown, the electronic device 70 includes a memory 71 and a processor 72.
[0144] In this embodiment, the memory 71 is used to store program data, and when the program data is executed, it implements the steps in the gait recognition method described above; the processor 72 is used to execute the program instructions stored in the memory 71 to implement the steps in the gait recognition method described above.
[0145] Specifically, processor 72 controls itself and memory 71 to implement the steps in the gait recognition method described above. Processor 72 can also be called a CPU (Central Processing Unit). Processor 72 may be an integrated circuit chip with signal processing capabilities. Processor 72 can also be a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. A general-purpose processor can be a microprocessor or any conventional processor. Furthermore, processor 72 can be implemented using multiple integrated circuit chips.
[0146] Unlike existing technologies, this embodiment uses processor 72 to extract and recognize gait features from the acquired gait sequence of the target object. The gait recognition model used by processor 72 is obtained by comparative training using equally spaced sampled gait sequences (including continuous frame gait sequences) and randomly spaced sampled gait sequences. This combines the computational advantages of multiple gait recognition algorithms, solving both the problem of decreased algorithm recognition efficiency caused by training with only continuous frame gait sequences and the problem of decreased algorithm recognition accuracy caused by training with only equally spaced sampled sequences. This improves the gait recognition effect of the gait recognition model in real-world scenarios, thereby meeting the need for accurate gait recognition.
[0147] Correspondingly, this application provides a computer-readable storage medium.
[0148] Please see Figure 8 , Figure 8 This is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
[0149] The computer-readable storage medium 80 includes a computer program 801 stored on it. When executed by the processor, the computer program 801 implements the steps of the gait recognition method described above. Specifically, if the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium 80. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a computer-readable storage medium 80 and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of various embodiments of this application. The aforementioned computer-readable storage medium 80 includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0150] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus implementations described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0151] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.
[0152] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0153] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0154] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
[0155] If the technical solution of this application involves personal information, the product using this technical solution has clearly informed the user of the personal information processing rules and obtained the user's voluntary consent before processing the personal information. If the technical solution of this application involves sensitive personal information, the product using this technical solution has obtained the user's separate consent before processing the sensitive personal information, and also meets the requirement of "express consent". For example, at personal information collection devices such as cameras, clear and prominent signs are set up to inform users that they have entered the scope of personal information collection and that personal information will be collected. If an individual voluntarily enters the collection scope, it is deemed that they have agreed to the collection of their personal information; or on the personal information processing device, with clear signs / information informing users of the personal information processing rules, authorization is obtained from the individual through pop-up information or by asking the individual to upload their personal information; wherein, the personal information processing rules may include information such as the personal information processor, the purpose of personal information processing, the processing method, and the types of personal information processed.
Claims
1. A gait recognition method, characterized in that, include: Obtain the gait sequence of at least one target object; The gait sequence is input into a gait recognition model, which extracts the gait features of the target object from the gait sequence. The gait recognition model is obtained by comparative training using equally spaced sampled gait sequences and randomly spaced sampled gait sequences. The equally spaced sampled gait sequences include continuous frame gait sequences. The gait recognition model is used to match and identify the gait features with the stored user gait features, and the recognition results are output. The training of the gait recognition model includes: The main model is trained for gait recognition using a first training set, while the auxiliary model is trained for gait recognition using the remaining first and second training sets. Each first training set includes multiple equally spaced sampled gait sequences with the same sampling interval, and each second training set includes multiple randomly spaced sampled gait sequences with the same random frame skipping interval. The model parameters of the main model are back-trained using the gait loss function generated during each iteration of the main model training; and the model parameters of each auxiliary model are updated using the updated model parameters of the main model and the similarity loss function between the main model and each of the auxiliary models. The trained master model is used as the gait recognition model.
2. The gait recognition method according to claim 1, characterized in that, The step of obtaining the gait sequence of at least one target object includes: Acquire video images based on the monitored area; The video images are detected to obtain an image sequence of at least one target object; The human-shaped region of the target object is obtained based on the image sequence, and the gait sequence of the target object is obtained using the human-shaped region.
3. The gait recognition method according to claim 2, characterized in that, The step of detecting the video image to obtain an image sequence of at least one target object includes: Based on the video image, multiple consecutive frames of images are obtained, and the multiple consecutive frames of images are detected to obtain at least one human body detection box of the target object; The human body detection box corresponding to each target object is tracked in real time to establish tracking identity information for each target object. Based on the tracking identity information, determine the image sequence of each target object in the consecutive multi-frame images.
4. The gait recognition method according to claim 2, characterized in that, The step of obtaining the human-shaped region of the target object based on the image sequence, and obtaining the gait sequence of the target object using the human-shaped region, includes: The human detection bounding box in the image sequence is segmented to obtain the background region and the human shape region; The human-shaped region mask of the target object is obtained using the background region and the human-shaped region. The human-shaped region mask is binarized to obtain the gait sequence of the target object.
5. The gait recognition method according to any one of claims 1 to 4, characterized in that, The gait recognition model is trained in the following manner: Multiple first training sets and multiple second training sets are obtained based on multiple labeled image sequences; wherein each first training set includes multiple equally spaced sampling gait sequences with the same sampling interval, and each second training set includes multiple randomly spaced sampling gait sequences with the same random frame skipping interval; wherein the sampling intervals corresponding to the multiple first training sets are different, and the random frame skipping intervals corresponding to the multiple second training sets are different. The main model is trained for gait recognition using one of the first training sets, while multiple auxiliary models are trained for gait recognition using the remaining multiple first training sets and multiple multiple second training sets; wherein the input sequences of the main model and the multiple auxiliary models are gait sequences obtained based on the same labeled image sequence; The model parameters of the main model are back-trained using the gait loss function generated during each iteration of the main model training; and the model parameters of each auxiliary model are updated using the updated model parameters of the main model and the similarity loss function between the main model and each of the auxiliary models. Repeat the above steps of input and reverse update until the set number of iterations is reached, and use the trained master model as the gait recognition model.
6. The gait recognition method according to claim 5, characterized in that, The step of obtaining multiple first training sets and multiple second training sets based on multiple labeled image sequences includes: Obtain image sequences corresponding to multiple pedestrians from selected video images, along with annotation information for each image sequence; Each image sequence is sampled at equal intervals using multiple preset sampling intervals to obtain multiple equally spaced sampled image sequences with different sampling intervals based on each image sequence; wherein, the sampling interval includes 0; and, Each image sequence is sampled at random intervals using a preset set of multiple random frame skipping intervals, so as to obtain multiple random interval sampled image sequences with different random frame skipping intervals based on each image sequence; The corresponding equally spaced sampling gait sequence is obtained based on each equally spaced sampling image sequence, and the corresponding randomly spaced sampling gait sequence is obtained based on each randomly spaced sampling image sequence; From each image sequence, extract equally spaced gait sequences with the same sampling interval from among the multiple equally spaced gait sequences, and group the multiple equally spaced gait sequences with the same sampling interval into the same set to obtain multiple first training sets; wherein the number of the first training sets is the same as the number of preset sampling intervals; and, From each of the multiple random interval sampling gait sequences corresponding to each image sequence, extract the random interval sampling gait sequences with the same random frame skipping interval, and divide the multiple random interval sampling gait sequences with the same random frame skipping interval into the same set to obtain multiple second training sets; wherein, the number of the second training sets is the same as the number of the preset random frame skipping interval sets.
7. The gait recognition method according to claim 6, characterized in that, Before the step of performing equally spaced sampling on each of the image sequences using multiple preset sampling intervals to obtain multiple equally spaced sampled image sequences with different sampling intervals based on each of the image sequences, the method includes: Multiple different values are taken from a preset set of integers to obtain multiple sampling intervals with different values; wherein the maximum number of sampling intervals is equal to the number of integers included in the preset set of integers; wherein the preset set of integers includes 0; The step of randomly sampling each image sequence at multiple preset random frame skipping intervals to obtain multiple randomly sampled image sequences with different random frame skipping intervals based on each image sequence includes: In response to the random interval sampling image sequence having a preset number of image frames, a preset number of random values are taken from the preset integer set to obtain a set of random skipping intervals having a preset number of random values; Repeat the above steps of randomly selecting values to obtain multiple sets of the random frame skipping intervals; Each image sequence is sampled at random intervals using the acquired multiple sets of random frame skipping intervals, so as to obtain multiple random interval sampled image sequences with different random frame skipping intervals based on each image sequence.
8. The gait recognition method according to claim 6, characterized in that, The step of training the main model for gait recognition using one of the first training sets, and simultaneously training multiple auxiliary models for gait recognition using the remaining multiple first training sets and multiple multiple second training sets, includes: One of the equally spaced sampled gait sequences from the first training set is input into the main model to obtain the first gait feature; and, Simultaneously, the remaining multiple first training sets and multiple second training sets, based on the same labeled image sequence, are input into multiple auxiliary models to obtain multiple second gait features; The step of back-training the model parameters of the main model using the gait loss function generated during each iteration of the main model training includes: The gait loss function between the first step gait feature and the labeled information is calculated; Using the backpropagation algorithm, the main model parameters of the main model are updated based on the gait loss function to obtain the updated main model parameters; The step of updating the model parameters of each auxiliary model using the updated model parameters of the main model and the similarity loss function between the main model and each auxiliary model includes: The similarity loss function between the first step-state feature and each of the second step-state features is calculated; The updated main model parameters and each of the similarity loss functions are used to update the auxiliary model parameters corresponding to each auxiliary model, resulting in multiple updated auxiliary model parameters.
9. The gait recognition method according to claim 5, characterized in that, Before the step of training the main model for gait recognition using one of the first training sets, and simultaneously training multiple auxiliary models for gait recognition using the remaining multiple first training sets and multiple second training sets, the following steps are included: Multiple deep learning models with the same network structure were obtained; Set different initialization functions for each of the deep learning models; Based on the initialization function, different initialization parameters are generated to obtain multiple recognition models with different initialization parameters; One of the recognition models is used as the main model, and the remaining multiple recognition models are used as multiple auxiliary models.
10. An electronic device, characterized in that, include: A memory for storing program data, which, when executed, implements the steps in the gait recognition method as described in any one of claims 1 to 9; A processor is configured to execute the program data stored in the memory to implement the steps in the gait recognition method as described in any one of claims 1 to 9.
11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the gait recognition method as described in any one of claims 1 to 9.