Gait feature extraction method, gait recognition method and device
By employing 3D spatial alignment and dense sampling techniques in gait recognition, combined with Gaussian processing and feature fusion, the problem of low robustness in existing gait recognition methods is solved, achieving more accurate gait feature extraction and recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2022-12-20
- Publication Date
- 2026-06-12
AI Technical Summary
Existing gait recognition methods based on target key points are not robust, have large recognition errors, and cannot accurately identify gait features under changing external conditions.
By identifying the target body's 3D image in the target gait image sequence, alignment operations are performed in 3D space to determine the intersection region, and dense sampling is performed to obtain the target key point image sequence. Gait features are then extracted by combining Gaussian processing and feature fusion techniques.
It improves the accuracy and robustness of gait recognition, reduces recognition errors, and can accurately extract and recognize gait features under changing external conditions.
Smart Images

Figure CN116030531B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of gait recognition technology, and in particular to a gait feature extraction method, a gait recognition method and apparatus. Background Technology
[0002] In gait recognition research, the mainstream methods are mainly appearance-based and model-based.
[0003] Most appearance-based methods start with the target outline. However, this method is greatly affected by changes in appearance such as clothing, lighting, and occlusion.
[0004] Therefore, more and more researchers are introducing model-based methods to ensure robustness to changes in external conditions. Most existing model-based gait recognition research starts from the target posture and target key points. However, existing gait recognition methods based on target key points are not very robust and have large recognition errors. Summary of the Invention
[0005] This application provides a gait feature extraction method, a gait recognition method, and an apparatus, which can improve the robustness of gait recognition methods based on target key points and reduce recognition errors.
[0006] To achieve the above objectives, this application provides a gait feature extraction method, which includes:
[0007] Determine the three-dimensional image of the target body in each gait image in the target gait image sequence;
[0008] Based on the target body's 3D image from each gait image, perform alignment operations on each part of the target body in 3D space to determine the intersection area of each part in at least some gait images;
[0009] The intersection areas of each part are sampled to obtain the sampling points of the intersection areas in three-dimensional space;
[0010] Determine the key points corresponding to the sampling points of each part in the three-dimensional space on each gait image to obtain the target key point image sequence;
[0011] Based on the target key point image sequence, the gait features of the target are extracted.
[0012] In one embodiment, sampling is performed on the intersection area of the various parts, including:
[0013] Dense sampling is performed on the intersection areas of each part to obtain dense sampling points on each part.
[0014] In one embodiment, determining the target body 3D image for each gait image in the target gait image sequence includes:
[0015] Extract two-dimensional images of the target body from each gait image;
[0016] The target body's two-dimensional image is mapped into three dimensions to obtain the target body's three-dimensional image for each gait image.
[0017] In one embodiment, extracting a two-dimensional image of the target body from each gait image includes:
[0018] The target body pose estimation method is used to process each gait image to obtain a 3D surface map of each gait image;
[0019] The target body's two-dimensional image is mapped to three dimensions to obtain the target body's three-dimensional image for each gait image, including:
[0020] The target body's two-dimensional image is mapped to three dimensions using a three-dimensional surface map to obtain the target body's three-dimensional image for each gait image.
[0021] In one embodiment, the extraction of gait features of the target is performed based on a sequence of target keypoint images, prior to which the following steps are included:
[0022] Contour information is extracted from each gait image to obtain a sequence of target contour images;
[0023] Based on the target keypoint image sequence, gait features of the target are extracted, including:
[0024] Gait features are extracted using target contour image sequences and target key point image sequences.
[0025] In one embodiment, gait feature extraction is performed using a target contour image sequence and a target key point image sequence, including:
[0026] Gaussian processing is performed on the keypoint images to obtain keypoint heatmaps for each gait image, thus obtaining a target keypoint heatmap sequence;
[0027] Feature extraction is performed on the target key point heatmap sequence and the target contour image sequence to obtain heatmap features and contour features respectively;
[0028] Heatmap features and contour features are fused to obtain gait features.
[0029] In one embodiment, heatmap features and contour features are fused to obtain gait features, including:
[0030] A shallow fused feature is obtained by fusing heatmap features and contour features using a self-attention mechanism.
[0031] The feature extraction and fusion module is used to fuse heat map features and contour features, so as to fuse heat map features and contour features during the feature extraction process to obtain deep fused features;
[0032] By fusing deep fusion features and shallow fusion features, gait features are obtained.
[0033] In one embodiment, a feature extraction and fusion module is used to fuse heatmap features and contour features, so as to fuse heatmap features and contour features during the feature extraction process to obtain deep fused features, including:
[0034] The feature extraction and fusion module is used to fuse heat map features and contour features to obtain fused spatiotemporal features;
[0035] By performing horizontal pyramid mapping on the fused spatiotemporal features, deep fused features are obtained.
[0036] In one embodiment, deep fusion features and shallow fusion features are fused to obtain gait features, which includes: extracting features from the target key point image sequence to obtain temporal features;
[0037] The gait features are obtained by fusing deep fusion features and shallow fusion features, including: fusing temporal features, deep fusion features and shallow fusion features to obtain gait features.
[0038] In one embodiment, Gaussian processing is performed on the keypoint images to obtain keypoint heatmaps for each gait image, thereby obtaining a target keypoint heatmap sequence, including:
[0039] Gaussian processing is performed on the key point images of each gait image to obtain the initial heatmap of each gait image;
[0040] Based on the weights of each local region of the body, the pixels of each local region of the body in the initial heatmap of each gait image are weighted to obtain the key point heatmap of each gait image.
[0041] To achieve the above objectives, this application provides a gait recognition method, which includes:
[0042] Using the gait feature extraction method described above, feature extraction is performed on the target gait image sequence to obtain the target's gait features;
[0043] Gait recognition based on gait features.
[0044] To achieve the above objectives, this application also provides an electronic device including a processor; the processor is configured to execute instructions to implement the steps of the above method.
[0045] To achieve the above objectives, this application also provides a computer-readable storage medium for storing instruction / program data that can be executed to implement the above methods.
[0046] This application determines the target body's three-dimensional image in each gait image sequence, and performs alignment operations on the target body's three-dimensional images in three-dimensional space based on these images. This aligns the target body parts in at least some gait images to determine the intersection regions of these parts. This facilitates subsequent sampling of the intersection regions, thereby obtaining a target key point image sequence for target gait feature extraction. This application combines techniques such as aligning images in three-dimensional space, sampling the aligned three-dimensional images, and mapping the sampled points in three-dimensional space to each gait image. The key points determined for target gait feature extraction have a one-to-one correspondence across multiple images in the sequence. Therefore, the dynamic information of each key point in the sequence can be determined based on changes in the positional information of each key point, leading to accurate determination of the target body's dynamic information and extraction of relatively accurate target gait features. Gait recognition using the gait features extracted by this application's feature extraction method improves the robustness of gait recognition and reduces recognition errors. Attached Figure Description
[0047] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0048] Figure 1 This is a flowchart illustrating one implementation method of the gait feature extraction method of this application;
[0049] Figure 2 This is a data processing diagram in one embodiment of the gait feature extraction method of this application;
[0050] Figure 3 This is a schematic diagram of the feature extraction and fusion module in one embodiment of the gait feature extraction method of this application;
[0051] Figure 4 This is a flowchart illustrating one embodiment of the gait recognition method of this application;
[0052] Figure 5 This is a schematic diagram of the structure of one embodiment of the electronic device of this application;
[0053] Figure 6 This is a schematic diagram of one embodiment of the computer-readable storage medium of this application. Detailed Implementation
[0054] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application. In addition, unless otherwise specified (e.g., "or additionally" or "or in alternatives"), the term "or" as used herein refers to a non-exclusive "or" (i.e., "and / or"). Furthermore, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.
[0055] One method for extracting gait features based on target keypoints involves extracting keypoint images from each gait image in a sequence to obtain a keypoint image sequence; and then directly extracting target gait features based on this keypoint image sequence. However, because the target is moving, the distance between the target and the camera changes, thus changing the size of the target in the image captured by the camera. Furthermore, the number of keypoints detected on the target in the image varies with the proportion of pixels occupied by the target. When the target is close to the camera, it occupies a larger area in the image captured by the camera, resulting in a higher number of keypoints identified. Conversely, when the target is far from the camera, it occupies a smaller area, resulting in a lower number of keypoints identified. Consequently, the keypoints detected in multiple images within a sequence cannot have a one-to-one correspondence, making it impossible to determine the dynamic information of each keypoint based on the keypoint image sequence. This makes it difficult to accurately determine the dynamic information of the target's body, leading to low accuracy and poor robustness in the extracted gait features. For example, the key points of a sequence obtained by the pose estimation method (densepose) itself cannot be matched one-to-one.
[0056] Based on this, this application proposes a gait feature extraction method. This method aligns pixels in the three-dimensional target body coordinates and then maps them back to the two-dimensional target body sequence coordinates. This yields a fixed number of one-to-one corresponding key points on the target within the video sequence, enabling the generation of relatively accurate gait features based on the key point image sequence. This improves the accuracy of gait recognition using gait features and reduces gait recognition errors. It is understood that the target mentioned in this application is not limited, and can include, for example, a human, cat, or dog.
[0057] like Figure 1As shown, this application provides a gait feature extraction method according to a first embodiment, which includes the following steps. It should be noted that the step numbers are for simplification only and are not intended to limit the execution order of the steps. The execution order of each step in this embodiment can be arbitrarily changed without departing from the technical concept of this application.
[0058] S101: Determine the target body's three-dimensional image for each gait image in the target gait image sequence.
[0059] In the gait feature extraction method of this application, the target body three-dimensional image of each gait image in the target gait image sequence can be determined so that the target body parts can be aligned in three-dimensional space based on the target body three-dimensional image of each gait image in the target gait image sequence. This is to determine the intersection area of each part in at least some gait images, which facilitates the sampling of the intersection area of each part to obtain the target key point image sequence for target gait feature extraction. Thus, the combination of technical features of aligning images in three-dimensional space, sampling the aligned three-dimensional image, and mapping the sampling points in three-dimensional space to each gait image in this application determines that the key points for target gait feature extraction have a one-to-one correspondence in multiple images in the sequence. For example, the point in the middle of the arm always corresponds to the middle of the arm in the two-dimensional key point image sequence and will not change due to changes in movement. In this way, these points can represent the target body movement information to a certain extent. Thus, the dynamic information of each key point in the sequence can be determined based on the changes in the position information of each key point in the target key point image sequence, thereby accurately determining the dynamic information of the target body and extracting relatively accurate target gait features.
[0060] In one feasible approach, methods such as Densepose or D2-Net can be used to determine the target body 3D image for each gait image.
[0061] In another feasible approach, the target body 2D image of each gait image can be determined using methods such as densepose or ASM; then, the target body 2D image of each gait image is 3D mapped to obtain the target body 3D image of each gait image.
[0062] For example, the IUV map (i.e., the 3D surface map of the target body) of each gait image is obtained through the pose estimation method (densepose). The IUV map of the target body contains the 2D image information of the target body in the gait images. Based on the 2D images of the target body in each gait image, the coordinates of at least some 2D pixels are mapped to the 3D target body using the IUV map of each gait image, thus obtaining the 3D coordinates of each point of the target body in each gait image. Here, I represents a part of the target body, and the number of parts of the target body is unlimited. For example, the target body can be divided into 24 parts or 12 parts. UV represents texture map coordinates. In this way, the 2D pixels of the target body can be mapped to the 3D coordinate points of the target body through the UV coordinates of each pixel in the IUV map. Thus, the 2D to 3D coordinate mapping method given in Densepose can be used to map the 2D coordinates of the target body to 3D space using UV texture mapping.
[0063] S102: Based on the target body's three-dimensional image from each gait image, perform alignment operations on the target's various parts in three-dimensional space to determine the intersection areas of each part in at least some gait images.
[0064] After determining the target body 3D image of each gait image in the target gait image sequence, the alignment operation of each part in 3D space can be performed based on the target body 3D image of each part of the target body in at least some of the gait images in the sequence. This is to determine the intersection area of each part in at least some of the gait images, so that key point sampling can be performed in the intersection area later. This avoids the problem of key points in the sequence not being able to correspond one-to-one when key point sampling is performed directly on each 3D image when the area and / or region of the target body part in a sequence is not the same as the viewpoint changes.
[0065] The aforementioned alignment operations of different parts of the target body in three-dimensional space based on various gait images include: aligning the target hand, legs, abdomen, neck, head, and / or feet in three-dimensional space based on at least some of the gait images. Specifically, aligning the target hand in three-dimensional space means aligning the target hand's three-dimensional image from at least some of the gait images. The alignment operations for other parts of the target body are similar and will not be elaborated upon here.
[0066] For example, assuming the target gait image sequence includes 30 gait images, and the part to be aligned is the hand, the 3D images of the hand in the 30 gait images in the sequence can be aligned, and the intersection area of the hand in all gait images can be determined so that the key points of the hand can be sampled directly using the intersection area of the hand as the boundary.
[0067] Alternatively, different parts of the target body can be aligned using the intersection of all two-dimensional target body coordinates mapped to three-dimensional target body coordinates in a sequence as the boundary.
[0068] Furthermore, if a region is occluded in a small number of frames in the sequence, during the alignment operation of that region in three-dimensional space, the occluded images can be removed by setting a threshold, so that the intersection region of that region in all gait image frames where the region is not occluded can be determined based on step S102. The term "small number" in "small number of frames" means that the proportion of occluded images in all images in the sequence is less than a preset proportion, for example, less than 20% or 10%.
[0069] Optionally, the alignment method is not limited, and can be, for example, an alignment method based on the homography matrix or a SIFT feature point matching alignment method.
[0070] In addition, in step S102, or before step S102, the regions of each part of the target body in the three-dimensional image of each gait image can be determined so as to perform alignment operations of each part of the target in three-dimensional space.
[0071] In one feasible approach, image segmentation can be performed on the target body 3D image of each gait image to determine the different body parts in the target body 3D image of each gait image.
[0072] In another possible implementation, in step S101, the body parts in the target body 3D image of each gait image have been determined.
[0073] For example, the IUV map of the target body in each gait image is obtained by the pose estimation method (densepose). The IUV map of the target body contains two-dimensional information of each part of the target body in the gait image. Based on the two-dimensional image of the target body in each gait image, at least some two-dimensional pixels are mapped to the three-dimensional target body using the IUV map of each gait image to obtain the three-dimensional coordinates of each point of the target body in each gait image. In this process, the position of each part of the body in the three-dimensional image of the target body in each gait image is also determined.
[0074] S103: Sample the intersection area of each part to obtain the sampling points of the intersection area of each part in three-dimensional space.
[0075] After determining the intersection region of each part of the target in at least some gait images based on step S102, the intersection region of each part can be sampled to obtain the sampling points of the intersection region of each part in three-dimensional space.
[0076] In one feasible approach, by aligning the various parts in three-dimensional space, the correspondence between pixel coordinates in the target body three-dimensional image of each gait image can be determined. After sampling the intersection region of each part, the coordinates of each sampling point in the intersection region in the target body three-dimensional image of each gait image in the sequence can be determined, thereby determining the sampling points of each part in the target body three-dimensional image of each gait image, that is, obtaining the sampling points of the intersection region of each part in three-dimensional space. This ensures that the total number of sampling points in the intersection region of each part of all target body three-dimensional images is the same, and that the sampling points in the intersection region of each part of all target body three-dimensional images can correspond one-to-one.
[0077] For each part, the aforementioned three-dimensional images of the target body can further refer to all three-dimensional images of the target body that are not occluded for each part. That is, for a part that is occluded in some frames, all three-dimensional images of the target body corresponding to that part exclude the three-dimensional images of the target body that are occluded for that part.
[0078] In another feasible approach, after determining the intersection region of each part in all gait images through alignment operations in three-dimensional space, the same sampling method can be used to sample the intersection region of each part in the target body three-dimensional image of all gait images, so that the total number of sampling points in the intersection region of each part of all target body three-dimensional images is the same, and the sampling points in the intersection region of each part of all target body three-dimensional images can correspond one-to-one.
[0079] In another feasible approach, by aligning the various parts in three-dimensional space, the image of the intersection region of each part in the target body three-dimensional image of all gait images can be mapped into the same image. The intersection region of each part in the mapped image can be sampled to obtain the sampling points of the intersection region of each part in the mapped image. Then, based on the correspondence between the pixels in the mapped image and the pixels of the target body three-dimensional image of each gait image, the coordinates of each sampling point in the mapped image in the target body three-dimensional image of each gait image in the sequence are determined, thereby determining the sampling points of each part in the target body three-dimensional image of each gait image, that is, obtaining the sampling points of the intersection region of each part in three-dimensional space.
[0080] Optionally, dense sampling can be performed on the intersection areas of various parts to obtain dense sampling points on various parts of the target body. By collecting more dense key points of the target body through dense sampling, the temporal and dynamic information carried by the key points can be increased while increasing the number of key points. This allows for better extraction of dynamic features using the dense key points of the target body and also enhances the robustness of the key points. Specifically, more key points provide a certain tolerance for the accuracy of these key points. Even if some key points cannot be perfectly accurate, it will not significantly affect subsequent use. That is, the error of individual key points will not affect the overall temporal feature extraction. Furthermore, since these key points also have a one-to-one correspondence in the sequence, the dynamic features of gait can be extracted by focusing on the dynamic changes of each point. This helps to extract target gait features from key point image sequences, thus solving the problem of too few human key points and limited information.
[0081] Alternatively, uniform sampling can be used to sample the intersection areas of different body parts, resulting in evenly distributed sampling points across the target body. This ensures the sampling points reflect the distribution of the target body, facilitating the extraction of spatial features from the gait images. Of course, in other embodiments, non-uniform sampling can also be used to sample the intersection areas of different body parts.
[0082] Furthermore, a uniform sampling method can be used to perform dense sampling on the intersection areas of each part, so as to obtain densely distributed sampling points on each part of the target body.
[0083] S104: Determine the key points corresponding to the sampling points of each part in the three-dimensional space on each gait image to obtain the target key point image sequence.
[0084] After sampling the intersection areas of each part, the key points corresponding to the sampling points of each part in the three-dimensional space on each gait image can be determined to obtain the target key point image sequence. That is, the points sampled in the three-dimensional space can be mapped back to each frame of the two-dimensional sequence to obtain the target key point image sequence, so that the gait features of the target can be extracted based on the target key point image sequence.
[0085] Optionally, after determining the sampling points of each part of the target body in the three-dimensional image of each gait image based on step S103, the three-dimensional image of the target body in each gait image can be mapped to the two-dimensional target body surface to determine the key points corresponding to the sampling points of each part in the three-dimensional space of each gait image. Specifically, the key points corresponding to the sampling points of each part in the three-dimensional space of each gait image can be determined based on the correspondence between the pixel coordinates of each gait image and the target body three-dimensional image of each gait image, thus obtaining the key point image of each gait image. This results in a target key point image sequence. Furthermore, because the alignment operation ensures that the sampling points in the intersection area of each part of all target body three-dimensional images correspond one-to-one, the key points corresponding to the sampling points of each part in the three-dimensional space of each gait image determined based on the three-dimensional to two-dimensional mapping also correspond one-to-one. Therefore, the key points determined by the method of this application have a one-to-one correspondence in the target key point image sequence. Thus, the dynamic information of each key point can be determined based on the key point image sequence of this method, thereby accurately determining the dynamic information of the target body. Therefore, the gait features extracted by this method have high accuracy and robustness.
[0086] Preferably, in step S101, when the target body 2D images of each gait image are 3D mapped to obtain the target body 3D images of each gait image, if a sampling point in the target body 3D image is in the point mapped from 2D to 3D, the pixel index can be directly used to return to the 2D target body surface, that is, the key point coordinates corresponding to the sampling point in the 3D space of each part of the gait image are determined by the pixel index. If a sampling point in the target body 3D image is not in the point mapped from 2D to 3D, the key point coordinates corresponding to the sampling point in the 3D space of each part of the gait image can be calculated by interpolation based on at least two adjacent 2D-to-3D mapped points. For example, bilinear interpolation can be used to find the few points closest to the 3D key point (these points are directly mapped from the 2D target body points to the 3D target body surface, and preferably, these points need to be in the same part as the sampling point), and the 2D target body key point coordinates corresponding to the 3D key point can be obtained by bilinear interpolation of the 2D coordinates of these points.
[0087] S105: Extract the gait features of the target based on the target key point image sequence.
[0088] After extracting target key points from the target gait image sequence based on the above steps to obtain target key point images, the gait features of the target can be extracted based on the target key point image sequence to obtain relatively accurate target gait features.
[0089] In the first implementation, the target keypoint image sequence can be directly extracted to obtain the target's gait features. This can be achieved using feature extraction modules such as GCN, LSTM, or GRU.
[0090] In the second implementation, Gaussian processing can be applied to each keypoint image in the target keypoint image sequence to obtain keypoint heatmaps for each gait image, thus generating a target keypoint heatmap sequence. Feature extraction can then be performed on the target keypoint heatmap sequence to obtain the target's gait features. In this implementation, processing keypoint images into keypoint heatmaps ensures that the extracted target gait features possess both temporal and spatial information, resulting in highly discriminative spatiotemporal features. Specifically, 2D convolutional modules such as VGG or ResNet can be used to extract features from the keypoint heatmap sequence.
[0091] In the third implementation, before step S105, contour information can be extracted from each gait image to obtain the target body contour map for each gait image, thus obtaining the target contour image sequence. In step S105, the target contour image sequence and the target keypoint image sequence can be used to extract the spatiotemporal features of the target's gait. Specifically, any target detection and segmentation algorithm can be used to extract contour information from each gait image to obtain the target contour image sequence. Furthermore, 2D convolutional modules such as VGG or ResNet can be used to extract features from the target contour image sequence.
[0092] In a specific example, feature extraction can be performed on the target keypoint image sequence to obtain the target's temporal features; feature extraction can be performed on the target contour image sequence to obtain the target's contour features; and the target's temporal features and contour features can be fused to obtain the target's gait features. The temporal features are mostly temporal characteristics of the target's body, while the contour features are mostly spatial characteristics of the target's body. Therefore, the gait features obtained by fusing the target's temporal and contour features can possess highly discriminative spatiotemporal gait characteristics, thereby improving the accuracy and robustness of gait recognition using target gait features.
[0093] In another specific example, such as Figure 2As shown, Gaussian processing can be applied to each keypoint image in the target keypoint image sequence to obtain keypoint heatmaps for each gait image, thus obtaining the target keypoint heatmap sequence. Feature extraction can be performed on the target keypoint heatmap sequence to obtain the target's heatmap features. Feature extraction can also be performed on the target contour image sequence to obtain the target's contour features. Finally, the target's heatmap features and contour features are fused to obtain the target's gait features. Here, heatmap features incorporate both temporal and spatial information, while contour features are mostly spatial features of the target body. Therefore, the gait features obtained by fusing the target's heatmap features and contour features possess rich spatiotemporal gait characteristics, thereby improving the accuracy and robustness of gait recognition using target gait features.
[0094] Optionally, fusing the target's heatmap features and contour features can include: fusing heatmap features and contour features through a self-attention mechanism to obtain shallow fused features; fusing heatmap features and contour features through a feature extraction and fusion module to fuse heatmap features and contour features during feature extraction to obtain deep fused features; and fusing deep fused features and shallow fused features to obtain the target's gait features. By fusing heatmap features and contour features through two methods, both shallow and deep fused features can be obtained, resulting in fusion results of keypoint heatmaps and contour maps at different depths. This allows for the acquisition of as many features as possible, thereby improving the accuracy of the target's gait features; and by utilizing the features obtained from both methods as much as possible, rich and recognizable spatiotemporal gait features are ensured.
[0095] Optionally, the step of fusing heatmap features and contour features using a self-attention mechanism may include: performing pooling processing on the stitched feature map formed by concatenating heatmap features and contour features to obtain an intermediate feature map, thus obtaining shallow salient features; and processing the intermediate feature map using a self-attention mechanism to fuse the heatmap features and contour features to obtain the shallow fused features. Specifically, a shallow feature fusion sub-branch can be used to fuse heatmap features and contour features using a self-attention mechanism.
[0096] In addition, the steps of fusing heatmap features and contour features through a self-attention mechanism can include: the spliced features are processed by a feature extraction and fusion module and then deep features are extracted. During the extraction process, the features extracted from the key point heatmap and contour map are fused. This feature-level fusion will result in better spatiotemporal features, thus obtaining fused spatiotemporal features; the fused spatiotemporal features are expressed through a horizontal pyramid structure to obtain deep fused features, thus obtaining gait feature expressions that are more conducive to classification.
[0097] The structure of the feature extraction and fusion module can be as follows: Figure 3As shown, through Figure 3 The feature extraction and fusion module shown processes the spliced features to obtain the first output feature and the second output feature. When expressing the fused spatiotemporal features through a horizontal pyramid structure, the first output feature and the second output feature can be mapped to a horizontal pyramid to obtain the horizontal pyramid structure expression of the first output feature and the horizontal pyramid structure expression of the second output feature, respectively. Then, the horizontal pyramid structure expression of the first output feature and the horizontal pyramid structure of the second output feature are spliced or weighted and fused to obtain the deep fused feature.
[0098] In other embodiments, fusing the target's heatmap features and the target's contour features can be achieved by directly performing a weighted fusion of the target's heatmap features and the target's contour features. Alternatively, fusing the target's heatmap features and the target's contour features can be achieved by fusing the target's heatmap features and the target's contour features using at least three fusion methods, with at least two fusion methods having different feature fusion depths. This results in multi-level fusion of the target body key point heatmap sequence and the target body contour sequence during the feature extraction process.
[0099] Additionally, adaptive weights can be used in the keypoint heatmap to distinguish different local regions of the body. This allows for the extraction of spatiotemporal features from certain local regions when fusing keypoint heatmap and contour map features, thus facilitating gait recognition. In one embodiment, different weights are assigned to different local regions in the keypoint heatmap based on their importance, resulting in pixel weighting. This allows for the extraction of spatiotemporal features from highly important local regions when fusing keypoint heatmap and contour map features. In another embodiment, the gait feature extraction method of this application can be executed using a gait feature extraction network. Training the gait feature extraction network determines the weights of each local region of the body. Before extracting features from the keypoint heatmap sequence, the pixels corresponding to each local region of the body in the keypoint heatmap sequence can be weighted based on the weights of the local regions determined during training. This allows for the extraction of spatiotemporal features from local regions that are beneficial for gait recognition when fusing keypoint heatmap and contour map features.
[0100] In another specific example, feature extraction can be performed on the target keypoint image sequence to obtain the target's temporal features; Gaussian processing can be applied to each keypoint image in the target keypoint image sequence to obtain keypoint heatmaps for each gait image, thus obtaining the target keypoint heatmap sequence; feature extraction can be performed on the target keypoint heatmap sequence to obtain the target's heatmap features; feature extraction can be performed on the target contour image sequence to obtain the target's contour features; the target's heatmap features and target contour features are fused to obtain the target's fused features (which may include shallow fused features and / or deep fused features); the target's fused features are fused with the temporal features to obtain the target's gait features. Thus, the target body keypoints, target body keypoint heatmaps, and target body contours are obtained. Figure 3 Using various data as input, features of multimodal and multi-level gait sequences are extracted. Among them, the target body key points are extracted for temporal information through GCN, the target body key point heatmap integrates temporal and spatial information, and the target body contour map contains more spatial information and less temporal information. In this way, the spatiotemporal features of the gait sequence are enriched as much as possible through this multimodal and multi-scale feature extraction method, so as to have a certain degree of robustness while ensuring accuracy.
[0101] In this embodiment, a three-dimensional image of the target body is determined for each gait image in the target gait image sequence. Based on the three-dimensional image of the target body in each gait image in the target gait image sequence, an alignment operation is performed on each part of the target in three-dimensional space to determine the intersection area of each part in at least some gait images. This facilitates subsequent sampling of the intersection area of each part, thereby obtaining a target key point image sequence for target gait feature extraction. Thus, by combining the technical features of aligning images in three-dimensional space, sampling the aligned three-dimensional images, and mapping the sampling points in three-dimensional space to each gait image, the key points determined for target gait feature extraction have a one-to-one correspondence in multiple images of the sequence. In this way, the dynamic information of each key point in the sequence can be determined based on the changes in the position information of each key point in the target key point image sequence, thereby accurately determining the dynamic information of the target body and extracting relatively accurate target gait features.
[0102] Please see Figure 4 , Figure 4 This is a flowchart illustrating one embodiment of the gait recognition method of this application. It should be noted that if substantially the same result is obtained, this embodiment does not necessarily reflect that outcome. Figure 4 The illustrated process sequence is limited. In this embodiment, the gait recognition method includes the following steps:
[0103] S201: Using the above gait feature extraction method, feature extraction is performed on the target gait image sequence to obtain the target's gait features.
[0104] S202: Gait recognition based on gait features.
[0105] Please see Figure 5 , Figure 5 This is a schematic diagram of one embodiment of the electronic device of this application. The electronic device 10 includes a processor 12, which executes instructions to implement the above-described gait feature extraction method and image encoding method. For detailed implementation processes, please refer to the description of the above embodiments, which will not be repeated here.
[0106] Processor 12 can also be referred to as a CPU (Central Processing Unit). Processor 12 may be an integrated circuit chip with signal processing capabilities. Processor 12 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor, or processor 12 can be any conventional processor.
[0107] The electronic device 10 may further include a memory 11 for storing instructions and data required for the processor 12 to run.
[0108] The processor 12 is used to execute instructions to implement the methods provided by any embodiment of the gait feature extraction method and image encoding method of this application and any non-conflicting combination thereof.
[0109] Please see Figure 6 , Figure 6 This is a schematic diagram of the structure of a computer-readable storage medium in an embodiment of this application. The computer-readable storage medium 30 in this embodiment stores instruction / program data 31. When executed, this instruction / program data 31 implements the methods provided by any embodiment of the gait feature extraction method, image decoding method, and image encoding method, as well as any non-conflicting combination thereof. In one embodiment, the instruction / program data 31 can be formed into a program file and stored in the storage medium 30 in the form of a software product, so that a computer device (which may be a personal computer, server, or network device, etc.) or processor can execute all or part of the steps of the methods in various embodiments of this application. The aforementioned storage medium 30 includes various media capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or terminal devices such as computers, servers, mobile phones, and tablets.
[0110] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0111] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0112] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0113] The above are merely embodiments of this application and do not limit the scope of this patent application. Any equivalent structural or procedural changes made using the content of this application's specification and drawings, or any direct or indirect applications in other similar systems, are similarly included within the scope of patent protection of this application.
Claims
1. A gait feature extraction method, characterized in that, The method includes: Determine the three-dimensional image of the target body in each gait image sequence; Based on the target body 3D image of each gait image, perform alignment operation on each part of the target body in 3D space to determine the intersection area of each part in at least some gait images; The intersection region of the aforementioned parts is sampled to obtain the sampling points of the intersection region of the aforementioned parts in the three-dimensional space; Determine the key points corresponding to the sampling points in the three-dimensional space of each part on each gait image to obtain a target key point image sequence; Based on the target key point image sequence, the gait features of the target are extracted.
2. The gait feature extraction method according to claim 1, characterized in that, The sampling of the intersection region of the aforementioned parts includes: Dense sampling is performed on the intersection area of the aforementioned parts to obtain dense sampling points on each part.
3. The gait feature extraction method according to claim 1, characterized in that, The target body 3D image in each gait image of the gait image sequence for determining the target includes: Extract the target body two-dimensional image from each gait image; The two-dimensional image of the target body is mapped into three dimensions to obtain the three-dimensional image of the target body for each gait image.
4. The gait feature extraction method according to claim 3, characterized in that, The extraction of the target body two-dimensional image from each gait image includes: The target body pose estimation method is used to process each gait image to obtain a three-dimensional surface texture of each gait image; The step of performing three-dimensional mapping on the two-dimensional image of the target body to obtain the three-dimensional image of the target body for each gait image includes: The two-dimensional image of the target body is mapped in three dimensions using the three-dimensional surface map to obtain the three-dimensional image of the target body for each gait image.
5. The gait feature extraction method according to claim 1, characterized in that, The step of extracting the gait features of the target based on the target key point image sequence includes, prior to: Contour information is extracted from each gait image to obtain a target contour image sequence; The step of extracting the gait features of the target based on the target key point image sequence includes: The gait features are extracted using the target contour image sequence and the target key point image sequence.
6. The gait feature extraction method according to claim 5, characterized in that, The step of extracting the gait features using the target contour image sequence and the target key point image sequence includes: Gaussian processing is performed on the keypoint images to obtain keypoint heatmaps for each gait image, thus obtaining a target keypoint heatmap sequence; Feature extraction is performed on the target key point heatmap sequence and the target contour image sequence to obtain heatmap features and contour features respectively; The heatmap features and the contour features are fused to obtain the gait features.
7. The gait feature extraction method according to claim 6, characterized in that, The process of fusing the heatmap features and the contour features to obtain the gait features includes: The heatmap features and the contour features are fused using a self-attention mechanism to obtain shallow fused features; The feature extraction and fusion module is used to fuse the heat map features and the contour features, so as to fuse the heat map features and the contour features during the feature extraction process to obtain deep fused features; The gait features are obtained by fusing the deep fusion features and the shallow fusion features.
8. The gait feature extraction method according to claim 7, characterized in that, The feature extraction and fusion module fuses the heatmap features and the contour features to obtain deep fused features during the feature extraction process, including: The feature extraction and fusion module is used to fuse the heat map features and the contour features to obtain fused spatiotemporal features; The fused spatiotemporal features are mapped horizontally using a pyramid to obtain the deep fused features.
9. The gait feature extraction method according to claim 7, characterized in that, The step of fusing the deep fusion features and the shallow fusion features to obtain the gait features includes: extracting features from the target keypoint image sequence to obtain temporal features; The step of fusing the deep fusion features and the shallow fusion features to obtain the gait features includes: fusing the temporal features, the deep fusion features, and the shallow fusion features to obtain the gait features.
10. The gait feature extraction method according to claim 6, characterized in that, The process of performing Gaussian processing on the keypoint images to obtain keypoint heatmaps for each gait image, thereby obtaining a target keypoint heatmap sequence, includes: Gaussian processing is performed on the key point images of each gait image to obtain the initial heatmap of each gait image; Based on the weights of each local region of the body, the pixels of each local region of the body in the initial heatmap of each gait image are weighted to obtain the key point heatmap of each gait image.
11. A gait recognition method, characterized in that, The method includes: Using the gait feature extraction method according to any one of claims 1-10, feature extraction is performed on the target gait image sequence to obtain the gait features of the target; Gait recognition is performed based on the gait features.
12. An electronic device, characterized in that, The electronic device includes a processor; the processor is configured to execute instructions to implement the steps of the method as described in any one of claims 1-11.
13. A computer-readable storage medium storing instruction / program data thereon, characterized in that, When the instruction / program data is executed, it implements the steps of the method described in any one of claims 1-11.