A teacher classroom behavior recognition method, device, equipment and storage medium
By tracking and matching targets in the recognition of teachers' classroom behavior and detecting key points of human posture, the problem of target loss in existing technologies is solved and the recognition accuracy is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGZHOU SHIYUAN ELECTRONICS CO LTD
- Filing Date
- 2021-10-19
- Publication Date
- 2026-06-23
Smart Images

Figure CN116012869B_ABST
Abstract
Description
Technical Field
[0001] The embodiments of the present invention relate to the field of behavior recognition technology, and in particular to a method, device, equipment and storage medium for recognizing teacher classroom behavior. Background Technology
[0002] Classroom teaching, as a fundamental form of instruction, has always held a central position in education. In researching classrooms, the various elements within the micro-classroom should always be the focus. Furthermore, the teacher's behavior, as the leader of the classroom, reflecting their emotions and teaching state, is a crucial basis for classroom teaching research.
[0003] With the development of information technology in education, the use of computers to process images and recognize teachers' behavior plays an important role in classroom teaching research.
[0004] Most of the current mainstream solutions in the industry are based on traditional image analysis for localization and recognition or on deep learning algorithms for localization and recognition. These methods are basically based on a single video frame for target (i.e., teacher) location and behavior analysis, which is prone to target loss and results in low recognition accuracy. Summary of the Invention
[0005] This invention provides a method, apparatus, device, and storage medium for recognizing teacher classroom behavior, in order to reduce the problem of missing target objects due to missed detections in the target detection stage, thereby improving the accuracy of teacher classroom behavior recognition.
[0006] In a first aspect, embodiments of the present invention provide a method for recognizing teacher classroom behavior, including:
[0007] The teacher was detected as the target object from the acquired classroom images;
[0008] The target object is subjected to target tracking processing to form a tracking queue consisting of target objects in multiple consecutive frames of classroom images;
[0009] Match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image;
[0010] Based on the matching results, key points representing human posture are detected from the target objects in the current frame of the classroom image.
[0011] The behavior of the target object is determined based on the key points.
[0012] Secondly, embodiments of the present invention also provide a teacher classroom behavior recognition device, comprising:
[0013] The target object acquisition and detection module is used to detect the teacher as the target object from the acquired classroom images;
[0014] The tracking processing module is used to perform target tracking processing on the target object and form a tracking queue consisting of target objects in multiple consecutive frames of classroom images;
[0015] The matching module is used to match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image;
[0016] The key point detection module is used to detect key points representing human posture from target objects in the current frame of the classroom image based on the matching results.
[0017] The behavior determination module is used to determine the behavior of the target object based on the key points.
[0018] Thirdly, embodiments of the present invention also provide a computer device, comprising:
[0019] One or more processors;
[0020] Storage device for storing one or more programs;
[0021] When the one or more programs are executed by the one or more processors, the one or more processors implement the teacher classroom behavior recognition method provided in the first aspect of the present invention.
[0022] Fourthly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the teacher classroom behavior recognition method provided in the first aspect of the present invention.
[0023] The teacher classroom behavior recognition method provided in this invention detects the teacher as a target object from acquired classroom images, performs target tracking processing on the target object, and forms a tracking queue consisting of target objects from multiple consecutive classroom images. The target object in the current frame of the classroom image is matched with the target object in the tracking queue from the previous frame of the classroom image. Based on the matching result, key points representing human posture are detected from the target object in the current frame of the classroom image, and the behavior of the target object is determined based on these key points. By performing target tracking processing on the target object, forming a tracking queue consisting of target objects from multiple consecutive classroom images, and matching the target object in the current frame of the classroom image with the target object in the tracking queue from the previous frame of the classroom image, the problem of missed detections leading to target object loss during the target detection stage can be effectively reduced, thereby improving the accuracy of teacher classroom behavior recognition. Attached Figure Description
[0024] Figure 1A flowchart of a teacher classroom behavior recognition method provided in Embodiment 1 of the present invention;
[0025] Figure 2 This is a flowchart of a teacher classroom behavior recognition method provided in Embodiment 2 of the present invention;
[0026] Figure 3 This is a schematic diagram of the structure of a teacher classroom behavior recognition device provided in Embodiment 3 of the present invention;
[0027] Figure 4 This is a schematic diagram of the structure of a computer device provided in Embodiment 4 of the present invention. Detailed Implementation
[0028] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention, and not all of the structures.
[0029] Example 1
[0030] Figure 1 This is a flowchart of a teacher classroom behavior recognition method provided in Embodiment 1 of the present invention. This embodiment is applicable to teacher classroom behavior, and the method can be executed by the teacher classroom behavior recognition device provided in this embodiment of the present invention. This device can be implemented by software and / or hardware, and is usually configured in a computer device, such as... Figure 1 As shown, the method specifically includes the following steps:
[0031] S101. Detect the teacher as the target object from the acquired classroom images.
[0032] For example, classroom images of a teacher giving a lesson are recorded using a camera. These images can be video stream images. In this embodiment of the invention, a real-time processing method can be used, i.e., real-time acquisition of classroom images and real-time recognition of teacher classroom behavior; or a pre-recording method can be used, i.e., pre-recording of classroom images of the teacher giving a lesson and storing them locally or on a server, and retrieving these images from the local storage or server when it is necessary to recognize the teacher's classroom behavior. This embodiment of the invention does not impose any limitations on this method.
[0033] In this embodiment of the invention, an object detection model can be used to perform object detection processing on each frame of the acquired classroom image to detect the target object (i.e., the teacher). The target object is usually indicated by a detection bounding box. In this embodiment of the invention, the object detection model used for object detection processing can be a common object detection model, such as the Fast-RCNN model or the YOLO (You Only Look Once) series of models. This embodiment of the invention is not limited to any particular model.
[0034] S102. Perform target tracking processing on the target object to form a tracking queue consisting of target objects in multiple consecutive frames of classroom images.
[0035] In this embodiment of the invention, target tracking processing can be performed on the target object to identify the same target object in multiple frames of classroom images, thereby forming a tracking queue consisting of target objects from consecutive frames of classroom images. It should be noted that the establishment of the tracking queue is completed before acquiring the current frame of classroom image, and the target objects in the tracking queue are automatically updated as target objects are detected from the classroom images. In this embodiment of the invention, the target tracking algorithm used for target tracking processing is not limited; for example, the mean shift algorithm, the target tracking algorithm based on Kalman filtering, the target tracking algorithm based on particle filtering, or a method based on modeling moving targets.
[0036] S103. Match the target object in the current frame classroom image with the target object in the tracking queue in the previous frame classroom image.
[0037] In this embodiment of the invention, the target object in the current frame of the classroom image is matched with the target objects in the tracking queue. For example, the target object in the current frame of the classroom image is matched with the target object in the tracking queue of the previous frame of the classroom image to determine the matching result. The matching result includes matching and non-matching. Matching means that the target object in the current frame of the classroom image and the target object in the tracking queue of the previous frame of the classroom image belong to the same teacher; non-matching means that the target object in the current frame of the classroom image and the target object in the tracking queue of the previous frame of the classroom image do not belong to the same teacher. In this embodiment of the invention, the Hungarian matching algorithm, the Kuhn-Munkres algorithm, or a greedy algorithm can be used; this embodiment of the invention does not limit the specific algorithm used.
[0038] S104. Based on the matching results, detect key points representing human posture from the target object in the current frame classroom image.
[0039] In this embodiment of the invention, when the aforementioned matching result is a match, a human keypoint detection model is used to perform human keypoint detection on each target object, and key points representing human posture are detected from the target objects in the current frame classroom image.
[0040] Human keypoint detection, also known as human pose estimation, is a relatively fundamental task in computer vision, serving as a prerequisite for human action recognition, behavior analysis, and human-computer interaction. Generally, human keypoint detection can be subdivided into single / multi-person keypoint detection and 2D / 3D keypoint detection. Furthermore, some algorithms perform keypoint tracking after detection, a process known as human pose tracking.
[0041] For example, in this embodiment of the invention, the human key point detection model can be a CPM (Convolutional Pose Machines) model, a DeeperCut model, an AlphaPose model, or a SimplePose model, and this embodiment of the invention does not limit it.
[0042] S105. Determine the behavior of the target object based on key points.
[0043] In this embodiment of the invention, the behavior of the target object can be directly determined based on the key points of the target object in the current frame of the classroom image. For example, if the x-coordinates of two or more left-side key points (e.g., left shoulder, left elbow, left hip, left knee) are smaller than those of their corresponding right-side key points (e.g., right shoulder, right elbow, right hip, right knee), then the target object is considered to be facing the blackboard. Simultaneously, the relative positions of the target object's hand, elbow, and shoulder are determined; if the height of its hand or elbow is higher than its shoulder position, then the target object is considered to be raising its hand to write on the blackboard.
[0044] The teacher classroom behavior recognition method provided in this invention detects the teacher as a target object from acquired classroom images, performs target tracking processing on the target object, and forms a tracking queue consisting of target objects from multiple consecutive classroom images. The target object in the current frame of the classroom image is matched with the target object in the tracking queue from the previous frame of the classroom image. Based on the matching result, key points representing human posture are detected from the target object in the current frame of the classroom image, and the behavior of the target object is determined based on these key points. By performing target tracking processing on the target object, forming a tracking queue consisting of target objects from multiple consecutive classroom images, and matching the target object in the current frame of the classroom image with the target object in the tracking queue from the previous frame of the classroom image, the problem of missed detections leading to target object loss during the target detection stage can be effectively reduced, thereby improving the accuracy of teacher classroom behavior recognition.
[0045] Example 2
[0046] Figure 2 This is a flowchart of a teacher classroom behavior recognition method provided in Embodiment 2 of the present invention. This embodiment refines the above-described Embodiment 1, and describes in detail the process of forming a tracking queue and matching the target object, such as... Figure 2 As shown, the method includes:
[0047] S201. Detect all human objects from classroom images.
[0048] The acquired classroom images include not only teachers but also students. In this embodiment of the invention, a human detection model is used to detect all human objects (usually shown as bounding boxes) in the classroom images, including students and teachers. This embodiment uses a deep learning-based human detection model; specifically, to be applicable to embedded terminals, this invention employs a lightweight, deep-cropped YOLO V4 detection model. The model is first trained using a large amount of image data with human annotations, and then the trained model is deployed into the algorithm for human detection.
[0049] S202. Select the human body object located in the podium area from all human body objects as the target object representing the teacher.
[0050] Teachers are usually located in the podium area, while students sit outside the podium area. Therefore, all human objects can be filtered to select those located in the podium area as the target objects representing the teacher.
[0051] S203. Match the target object in the acquired classroom image with the target object in the tracking queue in the previous frame of the classroom image.
[0052] Specifically, the tracking queue is established before acquiring the current frame of the classroom image. The process of establishing the tracking queue is as described in steps S203 and S204. Step S203 includes the following sub-steps:
[0053] 1. Match the target object in the acquired classroom image with the target object in the tracking queue in the previous frame of the classroom image.
[0054] For example, a multi-target tracking algorithm is used to track multiple target objects in a classroom image.
[0055] Specifically, the process of the multi-target tracking algorithm is as follows:
[0056] Typically, target objects are represented by detection boxes in classroom images. However, teaching scenarios often contain numerous occlusions, easily obscuring parts of the target object (e.g., the lower half of the target object is obscured by a desk). This causes frequent abrupt changes in the coordinates of the detection box (i.e., the target object), increasing the difficulty of subsequent processing. In this embodiment of the invention, after detecting the target object, the detection box representing the target object is adaptively cropped to obtain the upper half of the detection box as the target detection box representing the target object. Specifically, after the system starts, the average width of the detection boxes of N target objects in the first frame of the classroom image is calculated, and then the detection boxes are cropped in the height direction according to a certain ratio. In this embodiment of the invention, 1.5 times the average width is used as the target size, and the upper half of the detection box is cropped as the target detection box.
[0057] The advantages of the above adaptive cropping are twofold: firstly, it allows the object detection model to output more stable detection boxes, avoiding frequent abrupt changes in detection boxes caused by occlusions such as lecterns and student desks, thus reducing the difficulty of subsequent processing steps; secondly, using the average width of the object detection box as the cropping scale can effectively adapt to different application scenarios, avoiding adverse effects caused by scale differences due to camera installation distance, height, angle, etc.
[0058] Next, the Hungarian matching algorithm is employed, using DIoU (Distance-IoU) as the cost function to match the target detection boxes in the acquired classroom image with the target detection boxes in the tracking queue of the previous frame's classroom image. Traditionally, using IoU as the cost function results in IoU = 0 when the target detection boxes in the acquired classroom image and those in the tracking queue of the previous frame's classroom image have no intersection. This leads to the output being the same for very close and very distant non-intersecting boxes, thus losing gradient direction and hindering optimization. By using DIoU as the cost function, a penalty term is added to minimize the distance between the center points of the two target detection boxes. Even if the two target detection boxes do not overlap, the gradient direction can still be determined based on the distance between their center points.
[0059] If DIoU is less than or equal to the preset value, the match is confirmed to be successful, and step S204 is executed.
[0060] If DIoU is greater than a preset value, it is considered that none of the detected target objects can match the target objects in the tracking queue (i.e., there is a possibility of missed detection). In this case, it is determined whether the unmatched target object in the previous frame of the classroom image is located in the center of the classroom image. If so, considering that the target object cannot suddenly disappear from the image and that the area of the target object's activity should be within a certain range, a single-target tracking algorithm is used to track the target object. Specifically, the single-target tracking process is as follows:
[0061] 1) Extract the first image features of the region where the target object that could not be matched was located in the previous frame of the classroom image.
[0062] Specifically, a region twice the size of the target detection box is selected as the center of the target detection box of the unmatched target object in the previous frame of the classroom image, and the image features of this region are extracted by a feature extraction network as the first image feature.
[0063] 2) Divide the area in the acquired classroom image that corresponds to the area where the target object is located in the previous frame of the classroom image into multiple image blocks.
[0064] Based on the location of the target object in the previous frame of the classroom image, find the region in the acquired classroom image that has the same location, and then divide the corresponding region in the acquired classroom image into multiple image blocks.
[0065] 3) Extract the second image features of each image block.
[0066] Specifically, the image features of each image patch are extracted using a feature extraction network as the second image feature.
[0067] 4) Calculate the first similarity between the first image features and each of the second image features.
[0068] Specifically, the first similarity between the first image features and each of the second image features is calculated. In this embodiment of the invention, the similarity can be characterized using cosine distance, Euclidean distance, etc., and this embodiment of the invention is not limited thereto.
[0069] 5) Take the image block corresponding to the second image feature with the largest first similarity and greater than or equal to the preset similarity threshold as the target object, and execute step S204.
[0070] 6) When the first similarity is less than the preset similarity threshold, it is confirmed that the target object is lost, that is, it is confirmed that there is a missed detection in the aforementioned target detection steps.
[0071] In some embodiments of the present invention, after determining that a target has been lost, the number of times the target object has been lost is recorded. When the number of times the same target object has been lost exceeds a preset threshold, it indicates that the target object has left the classroom, and the target object is removed from the tracking queue to avoid wasting computing resources by continuing to track target objects that have left the classroom.
[0072] In some embodiments of the present invention, after the above matching is completed, it is further determined whether there is a target object in the acquired classroom image that does not match the target object in the tracking queue. If so, it is considered that a new target object has appeared, and the unmatched target object is added to the tracking queue.
[0073] S204. Store the target objects in the acquired classroom images into the tracking queue to update the tracking queue.
[0074] Specifically, the target objects in the acquired classroom images are added to the tracking queue. If there are no empty slots in the tracking queue, the target object in the first frame of the tracking queue is deleted to update the target object in the tracking queue. It should be noted that when the tracking queue is empty, the detected target objects are directly added to the tracking queue, that is, a new target object is created in the tracking queue.
[0075] S205. Match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image.
[0076] Specifically, in this embodiment of the invention, after detecting a target object in the current frame of the classroom image, the target object in the current frame of the classroom image is matched with the target object in the tracking queue in the previous frame of the classroom image. This matching process is similar to the matching process during the establishment of the tracking queue, and the specific process is as follows:
[0077] Adaptive cropping is performed on the detection bounding boxes of target objects in the current frame of the classroom image to obtain the upper half of the detection bounding box of the target object in the current frame of the classroom image as the target detection bounding box. Specifically, after the system starts, the average width of the detection bounding boxes of N target objects in the first frame of the classroom image is calculated, and then the detection bounding boxes are cropped in the height direction according to a certain ratio. In this embodiment of the invention, 1.5 times the average width is used as the target size, and the upper half of the detection bounding box is cropped as the target detection bounding box.
[0078] The Hungarian matching algorithm is used, with DIoU as the cost function, to match the target detection boxes in the current frame of the classroom image with the target detection boxes in the tracking queue of the previous frame of the classroom image. If DIoU is less than or equal to a preset value, the match is confirmed to be successful, and step S206 is executed; if DIoU is greater than the preset value, it is considered that there is no target object among the detected target objects that can match the target objects in the tracking queue (i.e., there is a possibility of missed detection). At this time, it is determined whether the unmatched target object in the previous frame of the classroom image is located in the center part of the classroom image.
[0079] If so, considering that the target object cannot suddenly disappear from the image, and that the area where the target object is active should be within a certain range, a single-target tracking algorithm is used to track the target object. Specifically, the process of single-target tracking is as follows:
[0080] 1. Extract the third image features of the region where the unmatched target object is located in the previous frame of the classroom image.
[0081] Specifically, a region twice the size of the target detection box is selected as the center of the target detection box of the unmatched target object in the previous frame of the classroom image, and the image features of this region are extracted by a feature extraction network as the third image feature.
[0082] 2. Divide the area in the current frame of the classroom image that corresponds to the area where the target object is located in the previous frame of the classroom image into multiple image blocks.
[0083] Based on the location of the target object in the previous frame of the classroom image, find the region in the current frame of the classroom image that has the same location, and then divide the corresponding region in the acquired classroom image into multiple image blocks.
[0084] 3. Extract the fourth image features of each image block.
[0085] Specifically, the image features of each image patch are extracted using a feature extraction network as the fourth image feature.
[0086] 4. Calculate the second similarity between the features of the third image and each feature of the fourth image.
[0087] Specifically, the second similarity between the third image feature and each of the fourth image features is calculated. In this embodiment of the invention, the similarity can be characterized using cosine distance, Euclidean distance, etc., and this embodiment of the invention is not limited thereto.
[0088] 5. When the maximum value of the second similarity is greater than or equal to the preset similarity threshold, the match is confirmed to be successful, and step S206 is executed.
[0089] S206. Histogram equalization is used to enhance the contrast of the target object in the current frame classroom image to obtain a contrast-enhanced image.
[0090] Specifically, in this embodiment of the invention, to facilitate subsequent key point detection, the image within the detection box of the target object needs to undergo contrast enhancement processing to obtain a contrast-enhanced image. Specifically, for the image within the detection box of the target object, histogram equalization is used to enhance its contrast, redistributing the brightness values of the image within the detection box of the target object to 256 pixel values in the range [0, 255], so that the number of pixels corresponding to each pixel value is approximately equal.
[0091] After contrast enhancement, the brightness distribution of the image will be more uniform, making the outline of the human body in the image more obvious. Especially when the color of the human body's clothing is similar to the solid color background such as the wall or blackboard, the originally inconspicuous outline features can be enhanced, improving the accuracy of the key point detection model, thereby improving the accuracy of behavior recognition.
[0092] S207. Use a key point detection model to detect key points from the contrast-enhanced image, including those representing the head, left shoulder, right shoulder, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, and right knee.
[0093] In this embodiment of the invention, a keypoint detection model is used to detect keypoints representing the head, left shoulder, right shoulder, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, and right knee from a contrast-enhanced image. The keypoint detection model used in this embodiment is a simplified version of the SimplePose keypoint detection model. The model is first trained using a large amount of image data with human keypoint annotations, and then the trained model is deployed into the algorithm for keypoint detection.
[0094] S208. Determine the relative positions of each key point.
[0095] Specifically, after detecting the key points of the target object, the relative positions of each key point are determined, such as the relative positions of the left and right shoulders, the left and right elbows, and the left and right hands.
[0096] S209. Determine the behavior of the target object based on the relative positions of each key point.
[0097] Specifically, the relative positions of four pairs of key points—left shoulder, left elbow, left hip, left knee, and right shoulder, right elbow, right hip, right knee—are compared. If the x-coordinates of two or more left-side key points (left shoulder, left elbow, left hip, left knee) are smaller than the x-coordinates of their corresponding right-side key points (right shoulder, right elbow, right hip, right knee), then the object is considered to be facing the blackboard. Simultaneously, the relative positions of the object's hands, elbows, and shoulders are determined. If the height of the hand or elbow is higher than the shoulder position, then the object is considered to be raising its hand to write on the blackboard.
[0098] The above behavior judgment process directly uses key points to judge the teacher's behavior. Compared with other methods that use deep learning models for classification, it can effectively improve the efficiency of the algorithm and avoid the problem that deep learning algorithms have difficulty obtaining a large amount of effective training data.
[0099] The teacher classroom behavior recognition method provided in this invention performs target tracking processing to form a tracking queue consisting of target objects from multiple consecutive classroom images. It then matches the target objects in the current classroom image with those in the tracking queue from the previous frame, effectively reducing the problem of missed detections during the target detection stage and thus improving the accuracy of teacher classroom behavior recognition. Furthermore, adaptive cropping of the target object detection boxes allows the target detection model to output more stable detection boxes, avoiding frequent changes in detection boxes caused by occlusions such as the podium and student desks, reducing processing difficulty. Contrast enhancement of the image within the target object's detection box makes the human body contour more prominent, improving the accuracy of the keypoint detection model and thus enhancing the accuracy of behavior recognition. By directly using keypoints to judge teacher behavior, compared to other methods using deep learning models for classification, it effectively improves algorithm efficiency and avoids the problem of deep learning algorithms struggling to obtain large amounts of effective training data.
[0100] Example 3
[0101] Figure 3 This is a schematic diagram of the structure of a teacher classroom behavior recognition device provided in Embodiment 3 of the present invention, as shown below. Figure 3 As shown, the device includes:
[0102] The target object acquisition and detection module 301 is used to detect the teacher as the target object from the acquired classroom image;
[0103] Tracking processing module 302 is used to perform target tracking processing on the target object and form a tracking queue consisting of target objects in multiple consecutive frames of classroom images;
[0104] The matching module 303 is used to match the target object in the current frame classroom image with the target object in the tracking queue in the previous frame classroom image;
[0105] The key point detection module 304 is used to detect key points representing human posture from the target object in the current frame classroom image based on the matching results;
[0106] The behavior determination module 305 is used to determine the behavior of the target object based on the key points.
[0107] In some embodiments of the present invention, the target object acquisition and detection module 301 includes:
[0108] The human object detection submodule is used to detect all human objects from classroom images;
[0109] The filtering submodule is used to filter out human objects located in the podium area from all human objects as the target objects representing the teacher.
[0110] In some embodiments of the present invention, the tracking processing module 302 includes:
[0111] The first matching submodule is used to match the target object in the acquired classroom image with the target object in the tracking queue in the previous frame of the classroom image;
[0112] The first tracking queue update submodule is used to store the target object in the acquired classroom image into the tracking queue when a match is successful, so as to update the tracking queue.
[0113] In some embodiments of the present invention, the target object is shown as a detection box in the classroom image, and the first matching submodule includes:
[0114] The first clipping unit is used to adaptively clip the detection box to obtain the upper half of the detection box as the target detection box representing the target object.
[0115] The first matching unit is used to match the target detection boxes in the acquired classroom image with the target detection boxes in the tracking queue in the previous frame classroom image using the Hungarian matching algorithm and DIoU as the cost function.
[0116] The matching result confirmation unit is used to confirm a successful match when the DIoU is less than or equal to a preset value.
[0117] In some embodiments of the present invention, the tracking processing module 302 further includes:
[0118] The first judgment submodule is used to determine whether the target object in the previous frame of the classroom image is located in the center part of the classroom image when the DIoU is greater than a preset value.
[0119] The first feature extraction submodule is used to extract the first image features of the area where the target object is located in the previous frame of the classroom image when the target object is located in the center part of the classroom image.
[0120] The first image segmentation submodule is used to divide the region in the acquired classroom image that corresponds to the region where the target object is located in the previous frame classroom image into multiple image blocks;
[0121] The second feature extraction submodule is used to extract the second image features of each of the image blocks;
[0122] The first similarity calculation submodule is used to calculate the first similarity between the first image feature and each of the second image features;
[0123] The second tracking queue update submodule is used to store the image block corresponding to the second image feature with the largest first similarity and greater than or equal to a preset similarity threshold as the target object into the tracking queue, so as to update the tracking queue.
[0124] The loss confirmation submodule is used to confirm that the target object is lost when the first similarity scores are all less than a preset similarity threshold.
[0125] In some embodiments of the present invention, the tracking processing module 302 further includes:
[0126] The Loss Count Recording Submodule is used to record the number of times the target object has been lost after it has been confirmed that the target object has been lost.
[0127] The deletion submodule is used to delete the target object from the tracking queue when the number of times the target object is lost exceeds a preset threshold.
[0128] In some embodiments of the present invention, the teacher classroom behavior recognition device further includes:
[0129] The judgment module is used to determine whether there is a target object in the acquired classroom image that does not match the target object in the tracking queue after the target object is processed to form a tracking queue consisting of target objects in multiple consecutive frames of classroom images;
[0130] The target object addition module is used to add unmatched target objects to the tracking queue if there are any unmatched target objects in the classroom image.
[0131] In some embodiments of the present invention, the matching module 303 includes:
[0132] The cropping submodule is used to adaptively crop the detection boxes of target objects in the current frame classroom image, and obtain the upper half of the detection boxes of target objects in the current frame classroom image as the target detection boxes representing the target objects in the current frame classroom image;
[0133] The second matching submodule is used to match the target detection boxes in the current frame classroom image with the target detection boxes in the tracking queue in the previous frame classroom image using the Hungarian matching algorithm and DIoU as the cost function.
[0134] The first matching success confirmation submodule is used to confirm a successful match when the DIoU is less than or equal to a preset value;
[0135] The second judgment submodule is used to determine whether the target object in the previous frame of the classroom image is located in the center part of the classroom image when the DIoU is greater than a preset value.
[0136] The third feature extraction submodule is used to extract the third image features of the area where the target object is located in the previous frame of the classroom image, where the target object is located in the center of the previous frame of the classroom image.
[0137] The second image segmentation submodule is used to divide the area in the current frame classroom image that corresponds to the area where the target object is located in the previous frame classroom image into multiple image blocks;
[0138] The fourth feature extraction submodule is used to extract the fourth image features of each of the image blocks;
[0139] The second similarity calculation submodule is used to calculate the second similarity between the third image feature and each of the fourth image features;
[0140] The second matching success confirmation submodule is used to confirm a successful match when the maximum value of the second similarity is greater than or equal to a preset similarity threshold.
[0141] In some embodiments of the present invention, the key point detection module 304 includes:
[0142] The contrast enhancement submodule is used to enhance the contrast of the target object in the current frame classroom image by using histogram equalization when the matching result is a successful match, so as to obtain a contrast-enhanced image.
[0143] The key point detection submodule is used to detect key points representing the head, left shoulder, right shoulder, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, and right knee from the contrast-enhanced image using a key point detection model.
[0144] In some embodiments of the present invention, the behavior determination module 305 includes:
[0145] The relative position determination submodule is used to determine the relative position of each key point;
[0146] The behavior determination submodule is used to determine the behavior of the target object based on the relative positions of each key point.
[0147] The aforementioned teacher classroom behavior recognition device can execute the teacher classroom behavior recognition method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects for executing the teacher classroom behavior recognition method.
[0148] Example 4
[0149] Embodiment 4 of the present invention provides a computer device, Figure 4 This is a schematic diagram of the structure of a computer device provided in Embodiment 4 of the present invention, as shown below. Figure 4 As shown, the computer device includes:
[0150] The mobile terminal includes a processor 401, a memory 402, a communication module 403, an input device 404, and an output device 405; the number of processors 401 in the mobile terminal can be one or more. Figure 4 Taking a processor 401 as an example; the processor 401, memory 402, communication module 403, input device 404, and output device 405 in the mobile terminal can be connected via a bus or other means. Figure 4 Taking a bus connection as an example, the processor 401, memory 402, communication module 403, input device 404, and output device 405 mentioned above can be integrated into a computer device.
[0151] The memory 402, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, and modules, such as the module corresponding to the teacher classroom behavior recognition method in the above embodiments. The processor 401 executes various functional applications and data processing of the computer device by running the software programs, instructions, and modules stored in the memory 402, thereby realizing the aforementioned teacher classroom behavior recognition method.
[0152] Memory 402 may primarily include a program storage area and a data storage area. The program storage area may store the operating system and at least one application program required for a given function; the data storage area may store data created based on the use of the microcomputer. Furthermore, memory 402 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, memory 402 may further include memory remotely located relative to processor 401, which can be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0153] The communication module 403 is used to establish a connection with external devices (such as smart terminals) and to realize data interaction with external devices. The input device 404 can be used to receive input digital or character information, and to generate key signal inputs related to user settings and function control of the computer device.
[0154] The computer device provided in this embodiment can execute the teacher classroom behavior recognition method provided in any of the above embodiments of the present invention, and has corresponding functions and beneficial effects.
[0155] Example 5
[0156] Embodiment 5 of the present invention provides a storage medium containing computer-executable instructions, on which a computer program is stored. When executed by a processor, the program implements the teacher classroom behavior recognition method provided in any of the above embodiments of the present invention. The method includes:
[0157] The teacher was detected as the target object from the acquired classroom images;
[0158] The target object is subjected to target tracking processing to form a tracking queue consisting of target objects in multiple consecutive frames of classroom images;
[0159] Match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image;
[0160] Based on the matching results, key points representing human posture are detected from the target objects in the current frame of the classroom image.
[0161] The behavior of the target object is determined based on the key points.
[0162] It should be noted that the embodiments of the apparatus, device, and storage medium are basically similar to the method embodiments, so the description is relatively simple. For relevant details, please refer to the description of the method embodiments.
[0163] Based on the above description of the implementation methods, those skilled in the art can clearly understand that the present invention can be implemented using software and necessary general-purpose hardware, and of course, it can also be implemented using hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk, or optical disk, etc., including several instructions to cause a computer device (which may be a robot, personal computer, server, or network device, etc.) to execute the teacher classroom behavior recognition method described in any embodiment of the present invention.
[0164] It is worth noting that the various modules, sub-modules, and units included in the above-mentioned device are only divided according to functional logic, but are not limited to the above division, as long as they can achieve the corresponding functions; in addition, the specific names of each functional module are only for easy differentiation and are not used to limit the scope of protection of the present invention.
[0165] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0166] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0167] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.
Claims
1. A method for recognizing teacher classroom behavior, characterized in that, include: The teacher was detected as the target object from the acquired classroom images; The target object is subjected to target tracking processing to form a tracking queue consisting of target objects in multiple consecutive frames of classroom images; Match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image; Based on the matching results, key points representing human posture are detected from the target objects in the current frame of the classroom image. The behavior of the target object is determined based on the key points; The target object is shown as a detection box in the classroom image; the target tracking process for the target object, forming a tracking queue consisting of target objects from multiple consecutive frames of classroom images, includes: The detection box is adaptively cropped to obtain the upper half of the detection box as the target detection box representing the target object; The Hungarian matching algorithm is used, with DIoU as the cost function, to match the target detection boxes in the acquired classroom image with the target detection boxes in the tracking queue in the previous frame of the classroom image; If the DIoU is less than or equal to the preset value, the matching is confirmed to be successful, and the target object in the acquired classroom image is stored in the tracking queue to update the tracking queue. If the DIoU is greater than a preset value, then determine whether the target object in the previous frame of the classroom image is located in the center of the classroom image; If so, extract the first image feature of the region where the target object is located in the previous frame of the classroom image; The acquired classroom image is divided into multiple image blocks corresponding to the area where the target object is located in the previous frame classroom image; Extract the second image features of each of the image blocks; Calculate the first similarity between the first image feature and each of the second image features; The image patch corresponding to the second image feature with the highest first similarity and greater than or equal to a preset similarity threshold is stored as the target object in the tracking queue to update the tracking queue; If the first similarity scores are all less than the preset similarity threshold, the target object is confirmed to be lost.
2. The teacher classroom behavior recognition method according to claim 1, characterized in that, The teacher was detected as a target object from the acquired classroom images, including: Detect all human objects from classroom images; Select the human body object located in the podium area from all human body objects as the target object representing the teacher.
3. The teacher classroom behavior recognition method according to claim 1, characterized in that, After confirming that the target object is lost, the following is also included: Record the number of times the target object is lost; When the number of times the target object is lost exceeds a preset threshold, the target object is removed from the tracking queue.
4. The teacher classroom behavior recognition method according to claim 1, characterized in that, After performing target tracking processing on the target object to form a tracking queue consisting of target objects from multiple consecutive frames of classroom images, the process further includes: Determine whether there are any target objects in the acquired classroom images that do not match the target objects in the tracking queue; If so, the unmatched target object is added to the tracking queue.
5. The teacher classroom behavior recognition method according to any one of claims 1-4, characterized in that, Matching the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image includes: Adaptively crop the detection box of the target object in the current frame classroom image to obtain the upper half of the detection box of the target object in the current frame classroom image as the target detection box representing the target object in the current frame classroom image; The Hungarian matching algorithm is used, with DIoU as the cost function, to match the target detection boxes in the current frame classroom image with the target detection boxes in the tracking queue in the previous frame classroom image; If the DIoU is less than or equal to the preset value, then the match is confirmed to be successful; If the DIoU is greater than a preset value, then determine whether the target object in the previous frame of the classroom image is located in the center of the classroom image; If so, extract the third image features of the region where the target object is located in the previous frame of the classroom image; The region in the current frame of the classroom image that corresponds to the region where the target object is located in the previous frame of the classroom image is divided into multiple image blocks; Extract the fourth image features from each of the image blocks; Calculate the second similarity between the third image feature and each of the fourth image features; A successful match is confirmed when the maximum value of the second similarity is greater than or equal to the preset similarity threshold.
6. The teacher classroom behavior recognition method according to any one of claims 1-4, characterized in that, Based on the matching results, key points representing human posture are detected from the target object in the current frame of the classroom image, including: When the matching result is successful, histogram equalization is used to enhance the contrast of the target object in the current frame classroom image to obtain a contrast-enhanced image. A key point detection model is used to detect key points representing the head, left shoulder, right shoulder, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, and right knee from the contrast-enhanced image.
7. The teacher classroom behavior recognition method according to any one of claims 1-4, characterized in that, Determining the behavior of the target object based on the key points includes: Determine the relative positions of each key point; The behavior of the target object is determined based on the relative positions of each key point.
8. A teacher classroom behavior recognition device, characterized in that it comprises: The target object acquisition and detection module is used to detect the teacher as the target object from the acquired classroom images; The tracking processing module is used to perform target tracking processing on the target object and form a tracking queue consisting of target objects in multiple consecutive frames of classroom images; The matching module is used to match the target object in the current frame of the classroom image with the target object in the tracking queue in the previous frame of the classroom image; The key point detection module is used to detect key points representing human posture from target objects in the current frame of the classroom image based on the matching results. A behavior determination module is used to determine the behavior of the target object based on the key points; The target object is shown as a detection box in the classroom image; the tracking processing module includes: The first matching submodule includes: a first cropping unit, used to adaptively crop the detection box to obtain the upper half of the detection box as the target detection box representing the target object; a first matching unit, used to use the Hungarian matching algorithm, with DIoU as the cost function, to match the target detection box in the acquired classroom image with the target detection box located in the tracking queue in the previous frame classroom image; and a matching result confirmation unit, used to confirm successful matching when the DIoU is less than or equal to a preset value. The first tracking queue update submodule is used to store the target object in the acquired classroom image into the tracking queue when a match is successful, so as to update the tracking queue; The first judgment submodule is used to determine whether the target object in the previous frame of the classroom image is located in the center part of the classroom image when the DIoU is greater than a preset value. The first feature extraction submodule is used to extract the first image features of the area where the target object is located in the previous frame of the classroom image when the target object is located in the center part of the classroom image. The first image segmentation submodule is used to divide the region in the acquired classroom image that corresponds to the region where the target object is located in the previous frame classroom image into multiple image blocks; The second feature extraction submodule is used to extract the second image features of each of the image blocks; The first similarity calculation submodule is used to calculate the first similarity between the first image feature and each of the second image features; The second tracking queue update submodule is used to store the image block corresponding to the second image feature with the largest first similarity and greater than or equal to a preset similarity threshold as the target object into the tracking queue, so as to update the tracking queue. The loss confirmation submodule is used to confirm that the target object is lost when the first similarity scores are all less than a preset similarity threshold.
9. A computer device, characterized in that, include: One or more processors; Storage device for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the teacher classroom behavior recognition method as described in any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the teacher classroom behavior recognition method as described in any one of claims 1-7.