Human pose classification method, device, and storage medium

By filling in the secondary human body regions in the human body image to be classified, a filled image is generated for human pose classification, which solves the problem of classification accuracy in multiple human target detection boxes and achieves higher classification accuracy.

CN115761890BActive Publication Date: 2026-06-23ZHEJIANG DAHUA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG DAHUA TECH CO LTD
Filing Date
2022-11-22
Publication Date
2026-06-23

Smart Images

  • Figure CN115761890B_ABST
    Figure CN115761890B_ABST
Patent Text Reader

Abstract

The application discloses a human posture classification method, equipment and a storage medium. The human posture classification method comprises the following steps: dividing a plurality of human bodies contained in a human image to be classified into several kinds of human bodies, wherein the several kinds of human bodies comprise secondary human bodies; filling a secondary human body region corresponding to the secondary human bodies in the human image to be classified to obtain a filled image; and performing human posture classification on the filled image to obtain a target posture classification result. Since the secondary human body region in the filled image is filled, when the filled image is subjected to human posture classification, the secondary human body image can be prevented from interfering with the human posture classification, so that the accuracy of the human posture classification can be improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision technology, and in particular to a method, device and storage medium for human posture classification. Background Technology

[0002] Human posture includes human movements or behaviors, such as standing, falling, squatting, sitting, sitting in a chair, leaning on a table, turning the head, etc. Human posture classification refers to the process of classifying human movements or behaviors, and is widely used in scenarios such as health monitoring, human-computer interaction, and motion analysis.

[0003] Currently, human pose classification is typically achieved by detecting human objects in human images and using deep learning methods. However, when the image within the detection bounding box contains multiple human figures, these figures can interfere with each other, affecting the accuracy of pose classification. Summary of the Invention

[0004] The main technical problem solved by this invention is to provide a method, device and storage medium for human posture classification, which can improve the accuracy of human posture recognition.

[0005] To solve the above-mentioned technical problems, one technical solution adopted in this application is: to provide a human pose classification method, the method comprising: dividing multiple human bodies contained in a human image to be classified into several types of human bodies, the several types of human bodies including secondary human bodies; filling the secondary human body regions corresponding to the secondary human bodies in the human image to be classified to obtain a filled image; and using the filled image to perform human pose classification to obtain a target pose classification result.

[0006] The step of filling the secondary human body region corresponding to the secondary human body in the human body image to be classified to obtain a filled image includes: obtaining the positional representation information of the secondary human body from the human body instance segmentation result of the human body image to be classified; determining the secondary human body region in the human body image to be classified using the positional representation information of the secondary human body; and filling the secondary human body region in the human body image to be classified with pixel values ​​according to a preset pixel value filling method to obtain the filled image.

[0007] The location representation information includes at least one of human body detection box location information and human body segmentation mask. The step of determining the secondary human body region in the human body image to be classified using the secondary human body location representation information includes any one of the following steps: using the secondary human body detection box region in the human body image to be classified as the secondary human body region, where the secondary human body detection box region is the region corresponding to the human body detection box location information of the secondary human body; using the secondary human body segmentation mask region in the human body image to be classified as the secondary human body region, where the secondary human body segmentation mask region is the region corresponding to the human body segmentation mask of the secondary human body; and using the human body intersection region in the human body image to be classified as the secondary human body region, where the human body intersection region is the intersection region between the secondary human body detection box region and the secondary human body segmentation mask region of the secondary human body.

[0008] The plurality of human bodies include a main human body. The step of classifying human poses using the filled image to obtain a target pose classification result includes: obtaining the positional representation information of the main human body from the human instance segmentation result of the human body image to be classified; using the positional representation information of the main human body to remove interference from the main human body region corresponding to the main human body in the filled image to obtain a main human body image; and classifying human poses using the main human body image to obtain a target pose classification result.

[0009] The location representation information includes a human body segmentation mask. The step of using the location representation information of the main human body to remove interference from the main human body region corresponding to the main human body in the filled image to obtain a main human body image includes: fusing the human body segmentation mask of the filled image and the main human body to obtain the main human body image; and / or, the step of performing human pose classification on the main human body image to obtain a target pose classification result includes: performing pose classification on the main human body image using a pose classification network to obtain a target pose classification result.

[0010] The step of classifying multiple human bodies in the human body image to be classified into several types of human bodies includes: classifying the multiple human bodies into several types of human bodies based on the human body key point detection results of the human body image to be classified, wherein the human body key point detection results include key point information of the human body to be classified, wherein the human body to be classified is the human body in the human body image to be classified whose key points can be detected, and the key point information of the human body to be classified includes at least one of the number of key points of the human body to be classified and the confidence level of each key point of the human body to be classified.

[0011] The step of classifying multiple human bodies into several types based on the human keypoint detection results of the human body image to be classified includes: identifying the human bodies to be classified whose number of keypoints meets a first requirement as primary human bodies; or, using the confidence level of each keypoint of each human body to be classified to obtain a comprehensive confidence level of each human body to be classified, identifying the human bodies to be classified whose comprehensive confidence level meets a second requirement as primary human bodies; or, obtaining a weighted result of the number of keypoints and comprehensive confidence level of each human body to be classified, identifying the human bodies to be classified whose weighted result meets a third requirement as primary human bodies; and identifying other human bodies in the human body image to be classified besides the primary human bodies as secondary human bodies.

[0012] The procedure includes, prior to classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies, the following steps: determining the number of human bodies contained in the human body image to be classified based on the human body key point detection results or human body instance segmentation results of the human body image to be classified; in response to the number of human bodies being a single body, using the initial pose classification result in the human body instance segmentation results as the target pose classification result; and in response to the number of human bodies being at least two bodies, performing the steps of classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies and subsequent steps.

[0013] The aforementioned types of human bodies are divided based on the human body keypoint detection results of the human body image to be classified, and the secondary human body regions are determined based on the human body instance segmentation results of the human body image to be classified. Before classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies, the method further includes: extracting features from the human body image to be classified using a shared feature extraction network of a human body image processing model to obtain a feature image; performing keypoint detection on the feature image using a keypoint detection branch of the human body image processing model to obtain the human body keypoint detection results; and performing instance segmentation on the feature image using an instance segmentation branch of the human body image processing model to obtain the human body instance segmentation results.

[0014] The instance segmentation branch includes a classification sub-branch and a segmentation sub-branch. The method further includes: extracting features from the sample image using the shared feature extraction network to obtain a sample feature image; performing keypoint detection on the sample feature image using the keypoint detection branch to obtain a sample keypoint detection result; performing human segmentation on the sample feature image using the segmentation sub-branch to obtain a sample human segmentation mask for the sample image; performing classification processing on the sample feature image using the classification sub-branch to obtain a sample classification result, the sample classification result including an initial pose classification result and sample human detection bounding box information for each human in the sample image; obtaining a first difference between the sample keypoint detection result and the actual keypoint result of the sample image, a second difference between the sample human segmentation mask and the actual human segmentation mask of the sample image, and a third difference between the sample classification result and the actual classification result of the sample image; adjusting the parameters of the shared feature extraction network based on the first difference, adjusting the parameters of the keypoint detection branch based on the first difference, and adjusting the parameters of the classification sub-branch and the segmentation sub-branch in the instance segmentation branch based on the second and third differences.

[0015] Before classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies, the method further includes: preprocessing the original image to obtain a preprocessed image; using a human body detection module to perform human target detection on the preprocessed image to obtain initial detection boxes for each human target in the preprocessed image; and cropping the image portion of the preprocessed image corresponding to the initial detection box of one of the human targets to obtain the human body image to be classified.

[0016] To solve the above-mentioned technical problems, another technical solution adopted in this application is: to provide a processing device, including a memory and a processor coupled to each other, wherein the memory stores program instructions; and the processor is used to execute the program instructions stored in the memory to implement the above-mentioned human posture classification method.

[0017] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide a computer-readable storage medium for storing program instructions that can be executed to implement the above-mentioned human posture classification method.

[0018] The above scheme, when the human image to be classified contains multiple human figures, first fills in the secondary human figure regions corresponding to the secondary human figures in the image to be classified, obtaining a filled image. Then, the filled image is used for human pose classification to obtain the target pose classification result. Because the secondary human figure regions in the filled image are filled, interference from secondary human figures can be avoided when classifying human pose using the filled image, thereby improving the accuracy of human pose classification. Attached Figure Description

[0019] Figure 1 This is a flowchart illustrating an embodiment of the human posture classification method provided in this application;

[0020] Figure 2 This is a schematic diagram of the framework of an embodiment of the human image processing model provided in this application;

[0021] Figure 3 This is a flowchart illustrating another embodiment of the human posture classification method provided in this application;

[0022] Figure 4 This is a schematic diagram of the framework of an embodiment of the human posture classification device provided in this application;

[0023] Figure 5 This is a schematic diagram of the framework of an embodiment of the processing device provided in this application;

[0024] Figure 6 This is a schematic diagram of a framework of an embodiment of the computer-readable storage medium provided in this application. Detailed Implementation

[0025] To make the purpose, technical solution and effects of this application clearer and more explicit, the following describes this application in further detail with reference to the accompanying drawings and embodiments.

[0026] It should be noted that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship. Furthermore, "many" in this article means two or more. Moreover, the term "at least one" in this article means any combination of at least two of any one or more of a plurality of elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.

[0027] Furthermore, if the embodiments of this application involve descriptions such as "first" or "second," these descriptions are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, features defined with "first" or "second" may explicitly or implicitly include at least one of those features. Additionally, the technical solutions of various embodiments can be combined with each other, but this must be based on the ability of those skilled in the art to implement them. If the combination of technical solutions is contradictory or impossible to implement, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed in this application.

[0028] Please see Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the human posture classification method provided in this application. It should be noted that if substantially the same result is obtained, the method of this invention is not necessarily identical. Figure 1 The illustrated process sequence is limited. For example... Figure 1 As shown, the method includes the following steps:

[0029] S101: Divide the multiple human bodies contained in the human body image to be classified into several types of human bodies.

[0030] In this embodiment, the human image to be classified can be obtained by preprocessing the original image and detecting human targets. For example, a human detection module is used to detect human targets in the preprocessed image to obtain initial detection boxes for each human target in the preprocessed image; the image within the initial detection box corresponding to one of the human targets is taken as the human image to be classified.

[0031] The dense concentration of human figures or the viewing angle of the image detection device may result in multiple human figures being included in the image to be classified. These multiple human figures may include other human figures besides the target human figure. In one embodiment, when the image to be classified contains multiple human figures, the various types of human figures are categorized into primary human figures and secondary human figures. The primary human figure refers to the aforementioned target human figure; the secondary human figure refers to other human figures in the image to be classified besides the primary human figure. When classifying the image to be classified, the images of secondary human figures can affect the human pose classification result of the primary human figures.

[0032] In one embodiment, based on the human keypoint detection results of the human image to be classified, multiple human figures contained in the image can be divided into primary human figures and secondary human figures. The human keypoint detection results include keypoint information of the human figures to be classified, where the human figures to be classified are those in the image to be classified whose keypoints can be detected. The keypoint information of the human figures to be classified includes at least one of the number of keypoints of the human figures to be classified and the confidence level of each keypoint. For example, the human figure to be classified with the largest number of keypoints can be designated as the primary human figure; or, the human figure to be classified with the highest sum of keypoint confidence levels can be designated as the primary human figure; or, the number of keypoints and the sum of keypoint confidence levels of each human figure to be classified can be weighted, and the human figure to be classified with the highest weighted result can be designated as the primary human figure. Furthermore, other human figures in the image to be classified besides the primary human figures are designated as secondary human figures.

[0033] It is understood that in other embodiments, other methods can be used to divide the multiple human bodies contained in the human body image to be classified. For example, the division can be based on the image area of ​​each human body in the human body image to be classified. The human body with the largest image area in the human body image to be classified can be regarded as the primary human body, and the other human bodies in the human body image to be classified besides the primary human body can be regarded as secondary human bodies. This embodiment does not limit this.

[0034] S102: Fill the secondary human body region corresponding to the secondary human body in the human body image to be classified to obtain the filled image.

[0035] In this embodiment, the secondary human body region corresponding to the secondary human body can first be determined in the human body image to be classified based on the location representation information of the secondary human body. Then, the pixel values ​​of the secondary human body region in the human body image to be classified are filled to obtain a filled image.

[0036] In one example, the location representation information of the secondary human body includes at least one of the human body detection box location information and the human body segmentation mask. The human body detection box location information and the human body segmentation mask are obtained by performing human instance segmentation on the human image to be classified. In other examples, the location representation information of the secondary human body can also be other location information, such as the boundary coordinate information corresponding to the secondary human body image. This embodiment does not limit this.

[0037] It should be noted that, in this embodiment, the filled secondary human body region can be the entire image of the secondary human body in the human body image to be classified. Alternatively, the filled secondary human body region can be a portion of the image of the secondary human body in the human body image to be classified, with the remaining unfilled portion of the secondary human body image having little impact on the human pose classification result of the primary human body.

[0038] S103: Perform human pose classification using the filled image to obtain the target pose classification result.

[0039] In one embodiment, the filled image can be directly input into a pose classification network, and the pose classification network can be used to classify the human pose of the filled image to obtain the target pose classification result.

[0040] In one embodiment, the positional representation information of the main human figure can be used to remove interference from the main human figure region in the filled image, resulting in a main human figure image. Then, the main human figure image is input into a pose classification network for human pose classification to obtain the target pose classification result. For example, the positional representation information of the main human figure can be a human segmentation mask or the boundary coordinate information corresponding to the main human figure image. The human segmentation mask or the boundary coordinate information corresponding to the main human figure image can provide the pose classification network with the contour information of the main human figure, allowing the pose classification network to focus more on features within the contour when classifying human poses from the main human figure image, reducing interference from information outside the contour.

[0041] In this embodiment, when the human image to be classified includes multiple human figures, the secondary human figure regions corresponding to the secondary human figures in the human image to be classified are first filled to obtain a filled image. Then, the filled image is used for human pose classification to obtain the target pose classification result. Since the secondary human figure regions in the filled image are filled, interference from secondary human figure images can be avoided when performing human pose classification on the filled image, thereby improving the accuracy of human pose classification.

[0042] Please see Figure 2 , Figure 2 This is a schematic diagram of the framework of an embodiment of the human image processing model provided in this application. Figure 2 As shown, the human image processing model includes a shared feature extraction network, a key point detection branch, and an instance segmentation branch. The key point detection branch and the instance segmentation branch share the shared feature extraction network.

[0043] The shared feature extraction network is used to extract features from the human image to be classified, resulting in a feature image. In one example, the shared feature extraction network is a ResNet series network, such as ResNet18, ResNet34, or ResNet50. In other examples, the shared feature extraction network can be other types of feature extraction networks, which are not specifically limited in this embodiment. The shared feature extraction network includes multiple downsampling operations. For example, the shared feature extraction network includes 5 downsampling operations. The resolution of the human image to be classified input to the shared feature extraction network is 256*256, and after 5 downsampling operations, a feature image with a resolution of 8*8 is obtained.

[0044] The keypoint detection branch is used to detect keypoints in the feature image to obtain human keypoint detection results. These results include the keypoints of the human body, their confidence levels, and the connectivity relationships between them. In one example, the keypoint detection branch uses a bottom-up approach to detect keypoints in the feature image. For instance, it uses PIF (Part Intensity Field) to locate keypoints and PAF (Part Association Field) to associate them, forming a complete human pose. The detection efficiency of the keypoint detection branch is related to the resolution of the input feature image; for example, it achieves the highest efficiency when the resolution of the input feature image is 32*32. Optionally, in this embodiment, a deconvolution module is also included between the shared feature extraction network and the keypoint detection branch. This module expands the low-resolution feature image output by the shared feature extraction network to a set resolution before outputting it to the keypoint detection branch. For example, the shared feature extraction network outputs a feature image with a resolution of 8*8, which is then processed by the deconvolution module to obtain an image with a resolution of 32*32, which is then input to the key point detection branch.

[0045] The instance segmentation branch is used to segment the feature image into human instances. It includes a segmentation sub-branch and a classification sub-branch. The human instance segmentation result includes a human segmentation mask obtained through the segmentation sub-branch, and initial pose classification results and human detection box location information obtained through the classification sub-branch. In one example, the instance segmentation branch includes a Region Proposal Network (RPN), a fully connected layer (ROI), and three head branches: a Mask branch, a Bbox Regression branch, and a Classification branch. The RPN generates region candidate boxes. The ROI unifies the feature map regions corresponding to each candidate box to the same size and then outputs them to the three head branches. The Mask sub-branch performs pixel-level classification of the targets in the candidate regions, the Bbox Regression branch performs bounding box regression on the candidate regions, and the Classification branch classifies the candidate regions.

[0046] The aforementioned human image processing model is a model trained based on sample images. In this embodiment, a multi-task joint training method is used to obtain the aforementioned human image processing model. In one embodiment, the multi-task joint training includes the following steps: using a shared feature extraction network to extract features from the sample image to obtain a sample feature image; using a keypoint detection branch to perform keypoint detection on the sample feature image to obtain a sample keypoint detection result; and using a segmentation sub-branch to perform human segmentation on the sample feature image to obtain a sample human segmentation mask; using a classification sub-branch to perform classification processing on the sample feature image to obtain a sample classification result, the sample classification result including the initial pose classification result of the sample and the sample human detection box information of each human in the sample image; obtaining a first difference between the sample keypoint detection result and the actual keypoint result of the sample image, a second difference between the sample human segmentation mask and the actual human segmentation mask of the sample image, and a third difference between the sample classification result and the actual classification result of the sample image; based on the first difference, the second difference, and the third difference, adjusting the parameters of the shared feature extraction network, adjusting the parameters of the keypoint detection branch based on the first difference, and adjusting the parameters of the classification sub-branch and the segmentation sub-branch in the instance segmentation branch based on the second difference and the third difference.

[0047] Specifically, the human image processing model obtained when the sum of the first difference, the second difference, and the third difference is less than the difference threshold is used as the aforementioned human image processing model. For example, the first difference, the second difference, and the third difference can each be implemented using a corresponding loss function.

[0048] In this embodiment, a multi-task joint training approach is used to obtain the human image processing model. The keypoint detection branch and the instance segmentation branch (including segmentation sub-branch and classification sub-branch) share a common feature extraction network. During model training, the keypoint detection branch and the instance segmentation branch can provide some auxiliary information to the common feature extraction network. For example, the keypoint detection branch can provide keypoint information (e.g., keypoints of the human body and the connections between them) to the common feature extraction network, and the instance segmentation branch can provide human contour information. This auxiliary information makes it easier for the common feature extraction network to learn key information and human contour information, making the features extracted by the common feature extraction network more conducive to human pose classification. This leads to more accurate and robust human pose classification, improving the overall performance of human pose classification.

[0049] Please see Figure 3 , Figure 3 This is a flowchart illustrating another embodiment of the human posture classification method provided in this application. The method is based on... Figure 2 The implementation of a human image processing model in [the context of the image processing model]. For example... Figure 3 As shown, the method includes the following steps:

[0050] S301: Obtain the human body image to be classified.

[0051] In this embodiment, the human image to be classified is obtained by preprocessing the original image and detecting the human target. Step S301 includes the following sub-steps:

[0052] The first sub-step involves preprocessing the original image to obtain a preprocessed image.

[0053] For example, the original image can be obtained through an image detection device.

[0054] For example, preprocessing includes at least one of translation, rotation, and scaling. Translation of the original image refers to moving all pixels in the original image horizontally or vertically by a given translation amount. Rotation of the original image refers to rotating the original image by a certain angle around a certain point. Scaling of the original image refers to adjusting the image size of the original image. By preprocessing the original image, some irrelevant information can be eliminated, enhancing the detectability of relevant information.

[0055] Sub-step two involves using the human detection module to detect human targets in the preprocessed image, obtaining the initial detection bounding boxes for each human target in the preprocessed image.

[0056] In this embodiment, the preprocessed image includes multiple human targets. After performing human target detection on the preprocessed image, an initial detection box can be obtained for each human target. In addition to the image of the corresponding human target, each initial detection box may also include images of other minor human targets.

[0057] Sub-step three involves cropping the portion of the preprocessed image that corresponds to the initial detection bounding box of one of the human targets, to obtain the human image to be classified.

[0058] S302: Use the shared feature extraction network of the human image processing model to extract features from the human image to be classified, and obtain the feature image.

[0059] For information on shared feature extraction networks and related feature extraction methods, please refer to [link / reference]. Figure 2 The embodiments shown are omitted in detail here.

[0060] S303: Use the key point detection branch of the human image processing model to perform key point detection on the feature image and obtain the human key point detection results.

[0061] In this embodiment, the human body keypoint detection result includes keypoint information of the human body to be classified. The human body to be classified is the human body in the human body image to be classified in which keypoints can be detected. The keypoint information of the human body to be classified includes at least one of the number of keypoints of the human body to be classified and the confidence level of each keypoint of the human body to be classified.

[0062] For keypoint detection branches and related content, please refer to [link / reference]. Figure 2 The embodiments shown are omitted in detail here.

[0063] S304: Use the instance segmentation branch of the human image processing model to perform instance segmentation on the feature image to obtain the human instance segmentation result.

[0064] In this embodiment, the human instance segmentation result includes an initial pose classification result and positional representation information for each human body. The positional representation information for secondary human bodies is used to indicate their image positions within the human image to be classified. Specifically, the positional representation information for each human body includes at least one of human body detection box position information and a human segmentation mask. For example, the positional representation information may only include human body detection box position information; or, it may only include human segmentation mask information; or, it may include both human body detection box position information and a human segmentation mask. In one example, the human body detection box position information for each human body includes the coordinates of the corresponding human body detection box, for example, the coordinates of the two diagonals of the human body detection box. The human segmentation mask for each human body is binary data.

[0065] For details on instance branching, please refer to [link / reference]. Figure 2 The embodiments shown are omitted in detail here.

[0066] It should be noted that in this embodiment, the order of steps S303 and S304 is not specifically limited. Step S303 can be executed first and then step S304, or step S304 can be executed first and then step S303, or steps S303 and S304 can be executed simultaneously.

[0067] S305: Determine the number of human bodies contained in the human body image to be classified.

[0068] In one embodiment, the number of human bodies contained in the human body image to be classified is determined based on the human body key point detection results or the human body instance segmentation results of the human body image to be classified.

[0069] Specifically, in the human keypoint detection results of the human image to be classified, each human body corresponds to a set of keypoint information. The number of sets of keypoint information for each human body in the human image to be classified is taken as the number of human bodies contained in the human image to be classified. Alternatively, in the human instance segmentation results of the human image to be classified, each human body corresponds to a set of positional representation information. The number of sets of positional representation information for each human body in the human image to be classified is taken as the number of human bodies contained in the human image to be classified.

[0070] S306: Determine whether the number of human bodies in the human body image to be classified is a single one. If the number of human bodies in the human body image to be classified is a single one, proceed to step 307; if the number of human bodies in the human body image to be classified is at least two, proceed to step 308.

[0071] When the human image to be classified contains only a single human body, the initial pose classification result for that single human body is relatively accurate because the human image processing model is obtained through multi-task joint training. Therefore, the initial pose classification result in the human instance segmentation result can be directly used as the target pose classification result. When the human image to be classified contains at least two human bodies, these two human bodies will interfere with each other during human pose classification, resulting in an inaccurate initial pose classification result. Therefore, in this embodiment, when the human image to be classified contains multiple human bodies, interference factors in the human image to be classified are first eliminated before human pose classification is performed to improve the accuracy of the human pose classification result.

[0072] S307: Use the initial pose classification result in the human instance segmentation result as the target pose classification result.

[0073] The target pose classification result is the pose classification result of a single human body in the human body image to be classified.

[0074] S308: Based on the human key point detection results of the human image to be classified, the multiple human bodies contained in the human image to be classified are divided into several types of human bodies.

[0075] In this embodiment, the human body keypoint detection result includes keypoint information of the human body to be classified. Based on the keypoint information of the human body to be classified, the multiple human bodies contained in the human body image to be classified are divided into primary human bodies and secondary human bodies. The primary human body refers to the human target in the initial detection box, and the secondary human body refers to other human bodies in the initial detection box besides the human target. When classifying the human pose of the human body image to be classified, the secondary human body image can interfere with the human pose classification of the primary human body, leading to inaccurate human pose classification of the primary human body. Therefore, in this embodiment, the primary and secondary human bodies in the human body image to be classified are first divided, and then the secondary human body region image corresponding to the secondary human body in the human body image to be classified is further filled.

[0076] In one embodiment, the key point information of the human body to be segmented includes the number of key points of the human body to be segmented. Since the main human body in the human body image to be classified has a large number of key points and the secondary human body has a small number of key points, this embodiment can segment the multiple human bodies contained in the human body image to be classified based on the number of key points of the human body to be segmented. Specifically, the human body to be segmented that meets the first requirement in terms of the number of key points is determined as the main human body. And, the other human bodies in the human body image to be classified, excluding the main human body, are determined as secondary human bodies. The first requirement is that the number of key points is the largest.

[0077] In one embodiment, the key point information of the human body to be segmented includes the confidence level of each key point of the human body to be segmented. Since the image of the primary human body in the image to be classified is clearer than that of the secondary human bodies, the confidence level of the key points of the primary human body will be higher than that of the secondary human bodies. Therefore, in this embodiment, multiple human bodies contained in the image to be classified are segmented based on the confidence level of each key point of the human body to be segmented. Specifically, the comprehensive confidence level of each human body to be segmented is obtained using the confidence level of each key point of each human body to be segmented, and the human body to be segmented whose comprehensive confidence level meets the second requirement is determined as the primary human body. Furthermore, other human bodies in the image to be classified besides the primary human body are determined as secondary human bodies. The second requirement is that the comprehensive confidence level is the highest. The comprehensive confidence level of each human body to be segmented is the sum of the confidence levels of each key point of each human body to be segmented.

[0078] In one embodiment, the key point information of the human body to be segmented includes the number of key points of the human body to be segmented and the confidence level of each key point. The multiple human bodies contained in the human body image to be classified are segmented by combining the number of key points and the confidence level of each key point. Specifically, a weighted result of the number of key points and the overall confidence level of each human body to be segmented is obtained, and the human body to be segmented whose weighted result meets a third requirement is determined as the primary human body. Furthermore, other human bodies in the human body image to be classified besides the primary human bodies are determined as secondary human bodies. The third requirement is that the weighted result has the highest value. The sum of the weighting coefficients corresponding to the number of key points of each human body to be segmented and the weighting coefficients corresponding to the overall confidence level is a set value, for example, a set value of 1. For example, the weighting coefficient corresponding to the number of key points of each human body to be segmented is 0.5, and the weighting coefficient corresponding to the overall confidence level is 0.5.

[0079] Optionally, in this embodiment, to further improve the accuracy of human body segmentation in the human body image to be classified, before segmenting multiple human bodies in the human body image to be classified based on the key point information of the human body to be classified, the method further includes: removing key points with a confidence level lower than a confidence threshold from each key point of the human body to be classified. The confidence threshold is set according to actual needs.

[0080] S309: Obtain the location representation information of secondary human bodies from the human body instance segmentation results of the human body image to be classified.

[0081] Since the human instance segmentation results of the human image to be classified include the positional representation information of each human body, the positional representation information of secondary human bodies can be directly obtained from the human instance segmentation results of the human image to be classified.

[0082] S310: Use the location representation information of secondary human bodies to determine the secondary human body regions in the human body image to be classified.

[0083] In this embodiment, in order to facilitate the determination of the location of the secondary human body in the human body image to be classified, the location representation information of the secondary human body is used to determine the secondary human body region in the human body image to be classified.

[0084] In one embodiment, the location representation information of the secondary human body includes a human body segmentation mask of the secondary human body, and the secondary human body segmentation mask region in the human body image to be classified is taken as the secondary human body region. The secondary human body segmentation mask region is the region corresponding to the human body segmentation mask of the secondary human body.

[0085] In one embodiment, the location representation information of the secondary human body includes the location information of the human body detection box of the secondary human body, and the region of the secondary human body detection box in the human body image to be classified is taken as the secondary human body region. The secondary human body detection box region is the region corresponding to the location information of the human body detection box of the secondary human body.

[0086] In one embodiment, the location representation information of the secondary human body includes the human body segmentation mask of the secondary human body and the human body detection box location information of the secondary human body. Considering that the human body segmentation mask of the secondary human body predicted based on human instance segmentation may not be accurate enough, directly filling the secondary human body segmentation mask region may fill the image of the primary human body; and directly filling the secondary human body detection box region may also fill the image of the primary human body. Therefore, in order to improve the accuracy of image filling, in this embodiment, the human body intersection region in the human body image to be classified is taken as the secondary human body region, and the human body intersection region is the intersection region between the secondary human body detection box region and the secondary human body segmentation mask region.

[0087] Alternatively, in the latter two embodiments described above, a portion of the human detection frame containing the secondary human body can be designated as the secondary human body region. For example, a boundary can be set at a distance from the boundary of the secondary human body detection frame, and the area within that boundary can be designated as the secondary human body region. The set distance can be adjusted based on the actual human posture classification effect.

[0088] S311: Fill the secondary human body regions in the human body image to be classified with pixel values ​​according to the preset pixel value filling method to obtain the filled image.

[0089] In this embodiment, the image corresponding to the secondary human body region is filled by changing the pixel values ​​of the secondary human body region in the human body image to be classified. The pixel values ​​range from 0 to 255.

[0090] In some implementations, the preset pixel value filling method involves filling all pixels in the secondary human body area with the same pixel value, that is, filling the secondary human body area with the same color. Alternatively, the preset pixel value filling method involves randomly filling all pixels in the secondary human body area with different pixel values ​​or partially identical pixel values. This embodiment does not limit the specific filling method of the preset pixel values, as long as it can achieve the effect of blurring the image corresponding to the secondary human body area.

[0091] S312: Use the filled image to classify human pose and obtain the target pose classification result.

[0092] In one embodiment, the filled image can be directly input into a pose classification network, and the pose classification network can be used to classify the human pose of the filled image to obtain the target pose classification result.

[0093] In one embodiment, the positional representation information of the main human body can be obtained from the human instance segmentation results of the human body image to be classified; using the positional representation information of the main human body, the main human body region corresponding to the main human body in the filled image is de-interferenced to obtain the main human body image; the main human body image is then subjected to human pose classification to obtain the target pose classification result. By de-interfering with the main human body region corresponding to the main human body, the image information of non-main human body regions in the filled image can be avoided from interfering with human pose classification, further improving the accuracy of human pose classification.

[0094] In a specific application, the location representation information of the main human figure includes its segmentation mask. The main human figure image is obtained by fusing the filled image and the main human figure's segmentation mask. A pose classification network is then used to classify the pose of this image, yielding the target pose classification result. In one example, a concat fusion method is used to fuse the filled image and the main human figure's segmentation mask. When the pose classification network performs human pose recognition, the main human figure's segmentation mask provides the network with contour information, allowing it to focus more on features within the main human figure's contour range. This reduces interference from features outside the contour range on the pose classification.

[0095] In the above embodiments, the pose classification network can be Figure 2The combination of the shared feature extraction network and the classification sub-branch of the instance segmentation branch can also be other classification networks, and this embodiment does not impose specific restrictions on them.

[0096] In this embodiment, on the one hand, the human image processing model is obtained through multi-task joint training. When the human image to be classified includes a single human body, the initial pose classification result in the human instance segmentation result corresponding to the human image processing model is used as the target pose classification result, which can improve the accuracy of human pose classification for a single human body. When the human image to be classified includes multiple human bodies, the secondary human body regions corresponding to the secondary human bodies in the human image to be classified are first filled to obtain a filled image; then, the filled image is used for human pose classification to obtain the target pose classification result. Since the secondary human body regions in the filled image are filled, interference from secondary human body images can be avoided when performing human pose classification on the filled image, thereby further improving the accuracy of human pose classification. On the other hand, compared with traditional methods for human pose recognition based on wearable devices, the human pose classification method provided in this embodiment does not require wearable devices, has good real-time performance, and has a more accurate human pose recognition effect.

[0097] Please see Figure 4 , Figure 4 This is a schematic diagram of a framework of an embodiment of the human posture classification device provided in this application. In this embodiment, the human posture classification device 40 includes: a human body segmentation module 41, a filling module 42, and a human posture classification module 43.

[0098] The human body segmentation module 41 is used to divide the multiple human bodies contained in the human body image to be classified into several types of human bodies, including secondary human bodies. The filling module 42 is used to fill the secondary human body regions corresponding to the secondary human bodies in the human body image to be classified, to obtain a filled image. The human body pose classification module 43 is used to perform human body pose classification using the filled image to obtain the target pose classification result.

[0099] Optionally, the filling module 42 is used to obtain the positional representation information of the secondary human body from the human body instance segmentation result of the human body image to be classified; use the positional representation information of the secondary human body to determine the secondary human body region in the human body image to be classified; and fill the secondary human body region in the human body image to be classified with pixel values ​​according to a preset pixel value filling method to obtain a filled image.

[0100] Optionally, the location representation information includes at least one of human body detection box location information and human body segmentation mask. The filling module 42 uses the location representation information of the secondary human body to determine the secondary human body region in the human body image to be classified, including any one of the following steps: taking the secondary human body detection box region in the human body image to be classified as the secondary human body region, the secondary human body detection box region being the region corresponding to the human body detection box location information of the secondary human body; taking the secondary human body segmentation mask region in the human body image to be classified as the secondary human body region, the secondary human body segmentation mask region being the region corresponding to the human body segmentation mask of the secondary human body; taking the human body intersection region in the human body image to be classified as the secondary human body region, the human body intersection region being the intersection region between the secondary human body detection box region and the secondary human body segmentation mask region of the secondary human body.

[0101] Optionally, the human pose classification module 43 is used to obtain the positional representation information of the main human body from the human instance segmentation result of the human image to be classified; use the positional representation information of the main human body to remove interference from the main human body region corresponding to the main human body in the filled image to obtain the main human body image; and perform human pose classification on the main human body image to obtain the target pose classification result.

[0102] Optionally, the location representation information includes a human body segmentation mask, and the human body pose classification module 43 is used to fuse the fill image and the human body segmentation mask of the main human body to obtain the main human body image; and / or, the pose classification network is used to perform pose classification on the main human body image to obtain the target pose classification result.

[0103] Optionally, the human body segmentation module 41 is used to classify multiple human bodies into several types of human bodies based on the human body key point detection results of the human body image to be classified. The human body key point detection results include key point information of the human body to be classified. The human body to be classified is the human body in the human body image to be classified whose key points can be detected. The key point information of the human body to be classified includes at least one of the number of key points of the human body to be classified and the confidence level of each key point of the human body to be classified.

[0104] Optionally, the human body segmentation module 41 is used to identify the human body to be segmented that meets the first requirement in terms of the number of key points as the main human body; or, using the confidence of each key point of each human body to be segmented, to obtain the comprehensive confidence of each human body to be segmented, and to identify the human body to be segmented that meets the second requirement in terms of the comprehensive confidence as the main human body; or, to obtain the weighted result of the number of key points and the comprehensive confidence of each human body to be segmented, and to identify the human body to be segmented that meets the third requirement in terms of the weighted result as the main human body; and to identify other human bodies in the human body image to be classified besides the main human body as secondary human bodies.

[0105] Optionally, the human pose classification device 40 further includes a determination module 44, which is used to determine the number of human bodies contained in the human pose classification image based on the human keypoint detection results or human instance segmentation results of the human pose classification image before classifying the multiple human bodies contained in the human pose classification image into several types of human bodies. The human pose classification module 43 is used to take the initial pose classification result in the human instance segmentation result as the target pose classification result when the number of human bodies is a single one; and to perform the classification of the multiple human bodies contained in the human pose classification image into several types of human bodies and subsequent steps when the number of human bodies is at least two.

[0106] Optionally, the several types of human bodies are divided based on the human body keypoint detection results of the human body image to be classified, and the secondary human body regions are determined based on the human body instance segmentation results of the human body image to be classified. The human pose classification device 40 also includes a shared feature extraction module 45, a keypoint detection module 46, and an instance segmentation module 47. Before classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies, the shared feature extraction module 45 is used to extract features from the human body image to be classified to obtain a feature image; the keypoint detection module 46 is used to perform keypoint detection on the feature image to obtain human body keypoint detection results; and the instance segmentation module 47 is used to perform instance segmentation on the feature image to obtain human body instance segmentation results.

[0107] Optionally, the human pose classification device 40 further includes a model training module 48. A shared feature extraction module 45 is used to extract features from the sample image to obtain a sample feature image; a keypoint detection module 46 is used to perform keypoint detection on the sample feature image to obtain sample keypoint detection results; and an instance segmentation module 47 is used to perform human segmentation on the sample feature image to obtain a sample human segmentation mask, and to perform classification processing on the sample feature image to obtain a sample classification result, which includes the initial pose classification result and the sample human detection box information for each human in the sample image; the model training module 48 is used to obtain a first difference between the sample keypoint detection result and the actual keypoint result of the sample image, a second difference between the sample human segmentation mask and the actual human segmentation mask of the sample image, and a third difference between the sample classification result and the actual classification result of the sample image; based on the first difference, the second difference, and the third difference, the parameters of the shared feature extraction network are adjusted, the parameters of the keypoint detection branch are adjusted based on the first difference, and the parameters of the classification sub-branch and the segmentation sub-branch in the instance segmentation branch are adjusted based on the second difference and the third difference.

[0108] Optionally, the human pose classification device 40 further includes an image processing module 49. The image processing module 49 is used to preprocess the original image to obtain a preprocessed image before classifying the multiple human bodies contained in the human body image to be classified into several types of human bodies; to use a human body detection module to perform human target detection on the preprocessed image to obtain the initial detection box of each human target in the preprocessed image; and to crop the image portion of the preprocessed image corresponding to the initial detection box of one of the human targets to obtain the human body image to be classified.

[0109] It should be noted that the apparatus of this embodiment can perform the steps in the above method. For detailed descriptions of the relevant content, please refer to the method section above, which will not be repeated here.

[0110] Please see Figure 5 , Figure 5 This is a schematic diagram of a framework of an embodiment of the processing device provided in this application. In this embodiment, the processing device 50 includes a memory 51 and a processor 52.

[0111] Processor 52 can also be referred to as CPU (Central Processing Unit). Processor 52 may be an integrated circuit chip with signal processing capabilities. Processor 52 can also be a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. A general-purpose processor can be a microprocessor, or processor 52 can be any conventional processor 52, etc.

[0112] The memory 51 in the processing device 50 is used to store the program instructions required for the processor 52 to run.

[0113] The processor 52 is used to execute program instructions to implement the human posture classification method in this application.

[0114] Please see Figure 6 , Figure 6This is a schematic diagram of a framework of an embodiment of the computer-readable storage medium provided in this application. The computer-readable storage medium 60 of this embodiment stores program instructions 61, which, when executed, implement the human posture classification method provided in this application. The program instructions 61 can be formed into a program file and stored in the aforementioned computer-readable storage medium 60 in the form of a software product, so that a computer device (which may be a personal computer, server, or network device, etc.) can execute all or part of the steps of the methods of various embodiments of this application. The aforementioned computer-readable storage medium 60 includes various media capable of storing program code, such as a USB flash drive, mobile hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or terminal devices such as computers, servers, mobile phones, and tablets.

[0115] The above scheme, when the human image to be classified contains multiple human figures, first fills in the secondary human figure regions corresponding to the secondary human figures in the image to be classified, obtaining a filled image. Then, the filled image is used for human pose classification to obtain the target pose classification result. Because the secondary human figure regions in the filled image are filled, interference from secondary human figures can be avoided when classifying human pose using the filled image, thereby improving the accuracy of human pose classification.

[0116] In some embodiments, the functions or modules of the apparatus provided in this disclosure can be used to perform the methods described in the above method embodiments. The specific implementation can be referred to the description of the above method embodiments, and for the sake of brevity, it will not be repeated here.

[0117] The description of the various embodiments above tends to emphasize the differences between the various embodiments. The similarities or similarities between them can be referred to, and for the sake of brevity, they will not be repeated here.

[0118] In the several embodiments provided in this application, it should be understood that the disclosed methods, apparatuses, and systems can be implemented in other ways. For example, the apparatus implementations described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of apparatuses or units may be electrical, mechanical, or other forms.

[0119] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.

[0120] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0121] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0122] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. A method for classifying human postures, characterized in that, The method includes: The shared feature extraction network of the human image processing model is used to extract features from the human image to be classified, and a feature image is obtained. The human image to be classified is the image within the initial detection box corresponding to one of the human targets obtained after human target detection of the original image. The key point detection branch of the human image processing model is used to detect key points in the feature image to obtain human key point detection results. The human key point detection results include at least one of the number of key points of the human body to be classified and the confidence of each key point of the human body to be classified. The human body to be classified is the human body in the human image to be classified whose key points can be detected. The feature image is segmented using the instance segmentation branch of the human image processing model to obtain human instance segmentation results. The human instance segmentation results include initial pose classification results and position representation information of each human body. The position representation information includes human segmentation mask. Based on the human body key point detection results or the human body instance segmentation results, determine the number of human bodies contained in the human body image to be classified. In response to the fact that the number of human beings is a single entity, the initial pose classification result in the human body instance segmentation result is used as the target pose classification result; In response to the condition that there are at least two human bodies, based on the human body key point detection results, the multiple human bodies contained in the human body image to be classified are divided into primary human bodies and secondary human bodies; the secondary human body regions corresponding to the secondary human bodies in the human body image to be classified are filled to obtain a filled image; the human body segmentation mask of the primary human body is obtained from the human body instance segmentation results of the human body image to be classified; the filled image and the human body segmentation mask of the primary human body are fused to obtain the primary human body image; the primary human body image is subjected to human pose classification to obtain the target pose classification result.

2. The method according to claim 1, characterized in that, The step of filling the secondary human body region corresponding to the secondary human body in the human body image to be classified, to obtain a filled image, includes: The location representation information of the secondary human body is obtained from the human body instance segmentation results of the human body image to be classified. Using the location representation information of the secondary human body, the secondary human body region is determined in the human body image to be classified; The secondary human body region in the human body image to be classified is filled with pixel values ​​according to a preset pixel value filling method to obtain the filled image.

3. The method according to claim 2, characterized in that, The location representation information also includes human body detection box location information. The step of using the location representation information of the secondary human body to determine the secondary human body region in the human body image to be classified includes any one of the following steps: The secondary human detection box region in the human image to be classified is taken as the secondary human region, and the secondary human detection box region is the region corresponding to the human detection box position information of the secondary human. The secondary human body segmentation mask region in the human body image to be classified is taken as the secondary human body region, and the secondary human body segmentation mask region is the region corresponding to the human body segmentation mask of the secondary human body. The human body intersection region in the human body image to be classified is taken as the secondary human body region. The human body intersection region is the intersection region between the secondary human body detection box region and the secondary human body segmentation mask region of the secondary human body.

4. The method according to claim 1, characterized in that, The step of classifying the main human body image into human pose to obtain the target pose classification result includes: The pose classification network is used to classify the pose of the main human images to obtain the target pose classification result.

5. The method according to claim 1, characterized in that, Based on the human body key point detection results, the multiple human bodies are divided into the primary human body and the secondary human body, including: The human figures to be segmented that meet the first requirement in terms of the number of key points are identified as the main human figures; or, by using the confidence level of each key point of each human figure to be segmented, a comprehensive confidence level is obtained for each human figure to be segmented, and the human figures to be segmented that meet the second requirement in terms of the comprehensive confidence level are identified as the main human figures; or, by obtaining a weighted result of the number of key points and the comprehensive confidence level of each human figure to be segmented, the human figures to be segmented that meet the third requirement in terms of the weighted result are identified as the main human figures; and... Other human figures in the human image to be classified, besides the primary human figure, are identified as the secondary human figures.

6. The method according to claim 1, characterized in that, The instance segmentation branch includes a classification sub-branch and a segmentation sub-branch, and the method further includes: The shared feature extraction network is used to extract features from the sample image to obtain the sample feature image; The keypoint detection branch is used to perform keypoint detection on the sample feature image to obtain the sample keypoint detection result; and, The human body segmentation of the sample feature image is performed using the segmentation sub-branch to obtain the human body segmentation mask of the sample image. The human body segmentation of the sample feature image is then performed using the classification sub-branch to obtain the sample classification result. The sample classification result includes the initial pose classification result of the sample and the human body detection box information of each human body in the sample image. The method obtains a first difference between the sample keypoint detection result and the actual keypoint result of the sample image, a second difference between the sample human body segmentation mask and the actual human body segmentation mask of the sample image, and a third difference between the sample classification result and the actual classification result of the sample image. Based on the first difference, the second difference, and the third difference, the parameters of the shared feature extraction network are adjusted; based on the first difference, the parameters of the key point detection branch are adjusted; and based on the second difference and the third difference, the parameters of the classification sub-branch and the segmentation sub-branch in the instance segmentation branch are adjusted.

7. The method according to claim 1, characterized in that, Before classifying the multiple human bodies contained in the human image to be classified into several types of human bodies, the method further includes: The original image is preprocessed to obtain a preprocessed image; The human detection module is used to detect human targets in the preprocessed image to obtain the initial detection box of each human target in the preprocessed image; The image portion of the preprocessed image corresponding to the initial detection box of one of the human targets is cropped to obtain the human image to be classified.

8. A processing apparatus, characterized in that, Including interconnected memory and processor, The memory stores program instructions; The processor is used to execute program instructions stored in the memory to implement the method according to any one of claims 1-7.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium is used to store program instructions that can be executed to implement the method of any one of claims 1-7.