Method, device and equipment for counting video target posture and storage medium

By setting correction conditions in video target pose detection, teacher interference is corrected, the accuracy of pose statistics is improved, the statistical distortion caused by teacher misjudgment is solved, and more accurate classroom interaction evaluation data is provided.

CN115512300BActive Publication Date: 2026-06-16GUANGZHOU AVA ELECTRONICS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU AVA ELECTRONICS TECH CO LTD
Filing Date
2022-10-12
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing video target pose detection methods are prone to misclassifying teachers as students when collecting statistics on student poses, leading to distorted statistical results and affecting the accuracy of classroom teaching interaction evaluation.

Method used

By obtaining the number of targets and the distribution results within a specific area at the current statistical moment of the video, correction conditions are set to determine whether correction is triggered. The number of the first target is corrected to obtain a more accurate number of the second target, thus eliminating teacher interference.

Benefits of technology

Without adding other detection methods, the accuracy of the target pose count in the video was improved, providing more accurate data for interactive evaluation in classroom teaching.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115512300B_ABST
    Figure CN115512300B_ABST
Patent Text Reader

Abstract

The application discloses a method, device and equipment for counting target postures in a video, and a storage medium. The method comprises the following steps: obtaining a first target number of detected target postures in a current counting moment of a video; obtaining a distribution result of the detected target postures in a specific area in the current counting moment of the video; obtaining a correction condition, wherein the correction condition is a trigger condition related to the distribution of the target postures in the specific area; judging whether the distribution result triggers the correction condition; when the correction condition is triggered, correcting the first target number to obtain a second target number, and taking the second target number as a counting result of the target postures existing in the current counting moment. The application can more accurately count the number of target postures in the video without adding other detection means, and provides more accurate data for classroom teaching interactive evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and more specifically, to a method, apparatus, device, and storage medium for statistically analyzing the pose of video targets. Background Technology

[0002] With the continuous development of communication technology, concepts such as online education and remote work have been further promoted. For example, in educational recording systems or conference recording systems, classroom or conference recordings can be saved for users to study. In order to achieve purposes such as analyzing the classroom performance of teachers and students, certain postures (such as standing) in these classroom or conference video resources can be statistically analyzed to evaluate the classroom teaching interaction.

[0003] Currently, due to insufficient camera angles, side-mounted cameras are commonly used for student panoramic shots. However, this method results in teachers being captured in the student panoramic view. Since the usual posture statistics only target students, if existing target posture methods are used for detection and statistics, teachers are often included in the statistics, leading to distorted results and hindering subsequent classroom teaching interaction and evaluation. Summary of the Invention

[0004] To address the shortcomings of the prior art, this invention provides a method, apparatus, device, and storage medium for statistically analyzing the pose of video targets, thereby eliminating statistical distortion caused by teacher interference during the statistical process. The technical solution adopted by this invention is as follows.

[0005] In a first aspect, the present invention provides a method for statistically analyzing the pose of a target in a video, comprising the steps of:

[0006] Get the number of first targets whose poses are detected in the video at the current statistical moment;

[0007] Obtain the distribution of the detected target pose within a specific region at the current statistical moment in the video;

[0008] Obtain correction conditions, wherein the correction conditions are triggering conditions related to the distribution of the target pose within a specific region;

[0009] Determine whether the distribution result triggers the correction condition;

[0010] When the correction condition is triggered, the first number of targets is corrected to obtain the second number of targets, and the second number of targets is used as the statistical result of the existence of the target posture at the current statistical time.

[0011] In one implementation, the process of forming the correction conditions includes the steps of:

[0012] Get the total number of tests for the entire video;

[0013] Obtain the total distribution statistics of the target pose detected at each statistical time point within a specific region;

[0014] Based on the total number of detections and the total distribution statistics, correction conditions are formed.

[0015] In one implementation, the specific region includes: a first region;

[0016] The distribution results include: whether a target pose is detected in the first region;

[0017] The total distribution statistics include: the total number of times n1 the target posture was detected in the first region at each statistical time point;

[0018] The process of forming correction conditions based on the total number of detections and the total distribution statistics includes the following steps:

[0019] When the ratio of n1 to the total number of targets is within a first threshold range, the correction condition is set to: target pose is detected in the first region.

[0020] In one implementation, when there are multiple correction conditions, the process of correcting the first target number to obtain the second target number is not affected by the number of triggered correction conditions.

[0021] In one implementation, the specific region includes: a first region and a second region;

[0022] The distribution results include: whether the target pose is detected simultaneously in the first region and the second region;

[0023] The total distribution statistics include: the total number of times n2 the target posture was detected simultaneously in the first region and the second region at each statistical time point;

[0024] The process of forming correction conditions based on the total number of detections and the total distribution statistics includes the following steps:

[0025] When the ratio of n2 to the total number of targets is within the second threshold range, the correction condition is set as follows: target poses are detected simultaneously in the first region and the second region.

[0026] In one implementation, the process of correcting the first target number to obtain a second target number when the correction condition is triggered includes the following steps:

[0027] Obtain the correction parameters;

[0028] Set the number of second targets to the number of first targets minus the adjustment parameter.

[0029] In one implementation, the target posture is a standing posture.

[0030] In a second aspect, the present invention provides an apparatus for statistically analyzing the pose of a video target, comprising:

[0031] The acquisition module is used to acquire the number of first targets with detected target poses in the current statistical moment of the video, the distribution result of the target poses detected in the current statistical moment of the video in a specific area, and correction conditions, wherein the correction conditions are triggering conditions for the distribution of target poses in a specific area.

[0032] The judgment module is used to determine whether the distribution result triggers the correction condition;

[0033] The processing module is used to correct the first number of targets when the correction condition is triggered, to obtain a second number of targets, and to use the second number of targets as the statistical result of the existence of the target posture at the current statistical time.

[0034] Thirdly, the present invention provides a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method of any of the above embodiments.

[0035] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, characterized in that the program, when executed by a processor, implements the method of any of the above embodiments.

[0036] In this invention, correction conditions for various scenarios are summarized based on actual conditions. When a correction condition is triggered, the obtained statistical results are corrected to eliminate the influence of interference on the statistical results. This method, without adding other detection methods, more accurately counts the number of target poses in the video, providing more precise data for interactive evaluation in classroom teaching. Attached Figure Description

[0037] Figure 1 This is a flowchart illustrating Embodiment 1 of the present invention.

[0038] Figure 2 This is a flowchart illustrating one embodiment of the present invention.

[0039] Figure 3 This is a schematic diagram of the overall structure of Embodiment 2 of the present invention. Detailed Implementation

[0040] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0041] It should be noted that the terms "first, second, ..." used in the embodiments of the present invention are merely to distinguish similar objects and do not represent a specific order of objects. It is understood that "first, second, ..." can be interchanged in a specific order or sequence where permitted. It should be understood that the objects distinguished by "first, second, ..." can be interchanged where appropriate so that the embodiments of the present invention described herein can be implemented in an order other than those illustrated or described herein.

[0042] Example 1

[0043] Please see Figure 1 , Figure 1 This is a flowchart illustrating a method for statistically analyzing the pose of a target in a video according to Embodiment 1 of the present invention. The method includes steps S110, S120, S130, S140, and S150. It should be noted that steps S110, S120, S130, S140, and S150 are merely reference numerals used to clearly explain the embodiment and the accompanying drawings. Figure 1 The correspondence is not intended to limit the order of steps in this embodiment.

[0044] Step S110: Obtain the number of first targets whose poses are detected in the current statistical moment of the video.

[0045] The target posture in a video can be varied, such as standing or raising a hand. For ease of explanation, this embodiment uses standing as an example for detailed description. A video is very long, and it is impossible to detect standing postures in the video every moment. Therefore, the video is usually detected every few seconds to obtain the number of people currently standing. Taking a detection every 1 second as an example, each detection moment is a statistical moment. After the detection is completed, the number of people standing at the current statistical moment can be obtained. The first target count obtained in step S110 is the number of people standing. Typically, for standing detection, a deep learning-based standing target detection algorithm can be used to detect the number of people standing in the video.

[0046] Step S120: Obtain the distribution results of the target pose detected in the video at the current statistical moment within a specific region.

[0047] Step S120 obtains statistical data on the distribution of target poses within a specific region at the current statistical time. For example, the specific region can be divided into several smaller regions, and the number of target poses detected in each smaller region can be obtained to obtain the distribution result of the target poses. Typically, standing detection can also detect the coordinate information of the standing person at the same time. Through coordinate information transformation, it can be determined which smaller region within the specific region the standing person is located in.

[0048] It should be noted that the specific area in step S120 is a region in actual space, not a specific area in the video frame. For example, if the podium (a platform raised above the ground where a lecturer stands) is taken as the specific area, the position of the podium in the pan-tilt camera's view will change. However, regardless of the position change, step S120 obtains statistical data on the target's pose distribution on the podium.

[0049] Step S130: Obtain correction conditions, wherein the correction conditions are triggering conditions related to the distribution of the target pose within a specific region.

[0050] As mentioned in the background section, simply using the number of the first target as the number of people detected standing at the current statistical moment in the video may include teachers in the count, resulting in a distorted number. Therefore, a mechanism needs to be introduced to correct the result when it is determined that teachers have been included in the count, filtering out interference caused by teachers. In this method, a trigger condition related to the distribution of target postures within a specific area is added, i.e., a correction condition. When the correction condition is triggered, it is assumed that the count includes teachers, and a correction should be made.

[0051] Specifically, teachers typically spend most of their time at the podium during class, so the modification conditions in this method only apply to situations where a target posture is detected within a specific area (such as the podium). Furthermore, since teachers generally have their own habits and tend to linger in certain small areas during class, this modification condition can be more precisely applied to specific small areas. When standing is detected in certain specific small areas, it is assumed that the count includes a teacher.

[0052] It should be noted that the correction conditions are actually some preset scenarios that will include teachers in the statistics. This implementation does not restrict how these scenarios are obtained; they can be summarized through experience, statistics, and other methods.

[0053] Step S140: Determine whether the distribution result triggers the correction condition.

[0054] Step S150: When the correction condition is triggered, the first number of targets is corrected to obtain the second number of targets, and the second number of targets is used as the statistical result of the existence of the target posture in the current statistical time.

[0055] After obtaining the correction conditions, compare them with the distribution results within the specific region to see if the correction conditions have been triggered. If the correction conditions have been triggered, adjust the number of the first target to obtain the actual statistical results.

[0056] In a teaching setting, there is usually only one teacher, so when making corrections, the number of primary targets is typically reduced by one. However, when this method is applied to other fields, the number of people causing interference may be more than one, or the factors causing interference may differ depending on the correction conditions. In such cases, different correction methods should be adopted. Therefore, those skilled in the art should formulate appropriate correction strategies based on the actual situation. This implementation does not impose any restrictions on specific correction strategies.

[0057] This method summarizes correction conditions for various scenarios based on actual conditions. When these conditions are triggered, the obtained statistical results are corrected to eliminate the influence of interference on the statistical results. Without adding other detection methods, this method more accurately counts the number of target poses in the video, providing more precise data for interactive evaluation in classroom teaching.

[0058] In one implementation, such as Figure 2 As shown, the process of generating the correction conditions includes steps S210, S220 and S230.

[0059] Step S210: Obtain the total number of detections for the entire video segment;

[0060] Step S220: Obtain the total distribution statistics of the target pose detected at each statistical time point within a specific region;

[0061] Step S230: Based on the total number of detections and the total distribution statistics, a correction condition is formed.

[0062] This implementation method uses statistical analysis of the video to identify specific scenarios and formulate correction conditions. By examining the relationship between the number of times each sub-area within a specific area is stood and the total number of times, such as the percentage relationship, this implementation method concludes whether the standing in that area is the teacher's.

[0063] In one implementation, the specific region includes: a first region;

[0064] The total distribution statistics include: the total number of times n1 the target posture was detected in the first region at each statistical time point;

[0065] The process of step S230 includes: step S231.

[0066] Step S231: When the ratio of n1 to the total number of targets is within the first threshold range, let the correction condition be: a target pose is detected in the first region.

[0067] In this embodiment, a specific area is divided into several regions, including a first region. The total distribution statistics include the total number of times n1 of target poses are detected in the first region. Step S231 compares n1 with the total number of targets to obtain the percentage of people standing in the first region in the entire video. When the percentage is relatively high and reaches the first threshold range, it is considered that the excessively high percentage is caused by the over-counting of teachers standing, thus deriving the correction condition: the correction condition is triggered when a target pose is detected in the first region, that is, when the distribution result shows that a target pose is detected in the first region.

[0068] It should be noted that at any given statistical moment, regardless of how many target poses are detected within the first region, they are counted only once in the overall distribution statistics. Furthermore, regardless of whether a target pose is detected in a specific region outside the first region, as long as a target pose is detected within the first region, it is also counted once in the overall distribution statistics.

[0069] In one implementation, when there are multiple correction conditions, the process of correcting the first target number to obtain the second target number is not affected by the number of triggered correction conditions.

[0070] For a specific region, the entire region can be divided into a first region, or it can be divided into multiple regions, each of which can be considered a first region. Adapting to the above implementation method, for example, if the specific region contains two regions, the above implementation method can be applied to both regions simultaneously. In this case, a correction condition will be generated for each of the two regions. Since there are two correction conditions, there will inevitably be situations where both correction conditions are triggered simultaneously. However, since only one person affects the accuracy of the result, the reasonable approach during correction is to subtract only that person. Therefore, regardless of how many correction conditions are triggered, the correction method should be the same, and the correction should only be performed once, i.e., subtracting one each time.

[0071] In one implementation, the specific region includes: a first region and a second region;

[0072] The distribution results include: whether the target pose is detected simultaneously in the first region and the second region;

[0073] The total distribution statistics include: the total number of times n2 the target posture was detected simultaneously in the first region and the second region at each statistical time point;

[0074] The process of step S230 includes: step S232.

[0075] Step S232: When the ratio of n2 to the total number of targets is within the second threshold range, let the correction condition be: target pose is detected simultaneously in the first region and the second region.

[0076] This implementation method is similar to the previous one, except that it addresses the scenario where the target posture is detected simultaneously in both the first and second regions. In actual teaching interactions, students may also stand on the podium, resulting in both teachers and students on the podium at the same time. If they are standing in the same region, it is the scenario described in the previous implementation method. However, when they are in the first and second regions respectively, it is the scenario that this implementation method needs to address.

[0077] In one implementation, the process of correcting the first target number to obtain a second target number when the correction condition is triggered includes the following steps:

[0078] Obtain the correction parameters;

[0079] Let the number of second targets equal the number of first targets minus the correction parameter.

[0080] As mentioned earlier, in classroom teaching scenarios, the statistical results usually include teachers, leading to data distortion. Therefore, in this implementation, the correction parameter (which can be 1 in a classroom scenario) is subtracted from the number of the first target to eliminate interference caused by teachers and obtain reasonable statistical results.

[0081] It should be noted that this method does not restrict how the correction parameters are obtained. They can be determined by manual input, or they can be derived from specific analysis of the problem and the specific correction parameters can be obtained based on the statistical results of the distribution of the entire video.

[0082] In one implementation, the target posture is a standing posture.

[0083] Example 2

[0084] Corresponding to the method in Example 1, such as Figure 3 As shown, the present invention also provides a device 3 for statistically analyzing the pose of video targets, comprising: an acquisition module 310, a judgment module 320, and a processing module 330.

[0085] The acquisition module 310 is used to acquire the number of first targets with detected target poses in the current statistical moment of the video, the distribution result of the target poses detected in the current statistical moment of the video in a specific area, and correction conditions, wherein the correction conditions are triggering conditions for the distribution of target poses in a specific area.

[0086] The judgment module 320 is used to determine whether the distribution result triggers the correction condition;

[0087] The processing module 330 is used to correct the first number of targets when the correction condition is triggered, to obtain a second number of targets, and to use the second number of targets as the statistical result of the existence of the target posture at the current statistical time.

[0088] In one embodiment, the apparatus for statistically analyzing the pose of a video target further includes a condition generation unit, the condition generation unit comprising:

[0089] The data acquisition module is used to obtain the total number of detections in the entire video and the total distribution statistics of the target pose detected at each statistical moment within a specific area;

[0090] The condition formation module is used to form correction conditions based on the total number of detections and the total distribution statistics.

[0091] In one implementation, the specific region includes: a first region;

[0092] The distribution results include: whether a target pose is detected in the first region;

[0093] The total distribution statistics include: the total number of times n1 the target posture was detected in the first region at each statistical time point;

[0094] The process of forming correction conditions based on the total number of detections and the total distribution statistics includes the following steps:

[0095] When the ratio of n1 to the total number of targets is within a first threshold range, the correction condition is set to: target pose is detected in the first region.

[0096] In one implementation, when there are multiple correction conditions, the process of correcting the first target number to obtain the second target number is not affected by the number of triggered correction conditions.

[0097] In one implementation, the specific region includes: a first region and a second region;

[0098] The distribution results include: whether the target pose is detected simultaneously in the first region and the second region;

[0099] The total distribution statistics include: the total number of times n2 the target posture was detected simultaneously in the first region and the second region at each statistical time point;

[0100] The process of forming correction conditions based on the total number of detections and the total distribution statistics includes the following steps:

[0101] When the ratio of n2 to the total number of targets is within the second threshold range, the correction condition is set as follows: target poses are detected simultaneously in the first region and the second region.

[0102] In one implementation, the process of correcting the first target number to obtain a second target number when the correction condition is triggered includes the following steps:

[0103] Obtain the correction parameters;

[0104] Let the number of second targets equal the number of first targets minus the correction parameter.

[0105] In one implementation, the target posture is a standing posture.

[0106] In this device, correction conditions for various scenarios are summarized based on actual conditions. When a correction condition is triggered, the acquired statistical results are corrected to eliminate the influence of interference on the statistical results. This method, without adding other detection methods, more accurately counts the number of target poses in the video, providing more precise data for interactive evaluation in classroom teaching.

[0107] Example 3

[0108] This invention also provides a storage medium storing computer instructions that, when executed by a processor, implement the method for statistical video target pose of any of the above embodiments.

[0109] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, random access memory (RAM), read-only memory (ROM), magnetic disks, or optical disks.

[0110] Alternatively, if the integrated units of this invention are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this invention, or the parts that contribute to related technologies, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, terminal, or network device, etc.) to execute all or part of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, RAM, ROM, magnetic disks, or optical disks.

[0111] Corresponding to the computer storage medium described above, one embodiment also provides a computer device, which includes a memory, an encoder, and a computer program stored in the memory and executable on the encoder, wherein the encoder executes the program to implement any of the methods for statistical video target pose as described in the above embodiments.

[0112] The aforementioned computer equipment, based on actual conditions, summarizes correction conditions for various scenarios. When these conditions are triggered, the acquired statistical results are corrected to eliminate the influence of interference on the statistical results. This method, without adding other detection methods, more accurately counts the number of target poses in the video, providing more precise data for interactive evaluation in classroom teaching.

[0113] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0114] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. A method for statistically analyzing the pose of a target in a video, characterized in that, Including the following steps: Get the number of first targets whose poses are detected in the video at the current statistical moment; Obtain the distribution of the detected target pose within a specific region at the current statistical moment in the video; Obtain correction conditions, wherein the correction conditions are triggering conditions related to the distribution of the target pose within a specific region; Determine whether the distribution result triggers the correction condition; When the correction condition is triggered, the first number of targets is corrected to obtain the second number of targets, and the second number of targets is used as the statistical result of the existence of the target posture at the current statistical time. The process of forming the correction conditions includes the following steps: obtaining the total number of detections for the entire video segment; Obtain the total distribution statistics of the target pose detected at each statistical time point within a specific region; Correction conditions are formed based on the total number of detections and the total distribution statistics.

2. The method for statistically analyzing the pose of a target in a video according to claim 1, characterized in that, The specific area includes: the first area; The distribution results include: whether a target pose is detected in the first region; The total distribution statistics include: summing the number of times n1 a target pose is detected in the first region at each statistical time point; the process of forming a correction condition based on the total number of detections and the total distribution statistics includes the following steps: when the ratio of n1 to the total number of targets is within a first threshold range, the correction condition is set to: a target pose is detected in the first region.

3. The method for statistically analyzing the pose of a target in a video according to claim 2, characterized in that, When there are multiple correction conditions, the process of correcting the number of the first target to obtain the number of the second target is not affected by the number of correction conditions triggered.

4. The method for statistically analyzing the pose of a target in a video according to claim 1, characterized in that, The specific region includes: a first region and a second region; the distribution result includes: whether the target pose is detected simultaneously in the first region and the second region; The total distribution statistics include: summing the number of times n2 the target pose is detected simultaneously in the first and second regions at each statistical time point; the process of forming correction conditions based on the total number of detections and the total distribution statistics includes the following steps: When the ratio of n2 to the total number of targets is within the second threshold range, the correction condition is set as follows: target poses are detected simultaneously in the first region and the second region.

5. The method for statistically analyzing the pose of a target in a video according to any one of claims 1-4, characterized in that, The process of correcting the first target number to obtain the second target number when the correction condition is triggered includes the following steps: Obtain the correction parameters; Let the number of second targets = the number of first targets - the correction parameter.

6. The method for statistically analyzing the pose of a target in a video according to any one of claims 1-4, characterized in that, The target posture is a standing position.

7. A device for statistically analyzing the pose of a target in a video, characterized in that, include: The acquisition module is used to acquire the number of first targets with the first target pose detected in the video at the current statistical time, the distribution of the detected target poses in a specific region at the current statistical time, and the correction conditions. The correction condition is a triggering condition related to the distribution of the target pose within a specific region; The judgment module is used to determine whether the distribution result triggers the correction condition; The processing module is used to correct the first number of targets when the correction condition is triggered, to obtain a second number of targets, and to use the second number of targets as the statistical result of the existence of the target posture at the current statistical time. The apparatus for statistically analyzing the pose of video targets further includes a condition generation unit, which comprises: The data acquisition module is used to acquire the total number of detections in the entire video and the total distribution statistics of the target pose detected at each statistical moment within a specific region; the condition formation module is used to form correction conditions based on the total number of detections and the total distribution statistics.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1-6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-6.