An improved Openpose-based classroom multi-person abnormal behavior and mask wearing detection method

A detection method and abnormal technology, applied to biological neural network models, instruments, computer components, etc., can solve the problems of high cost, lack of classroom teaching quality evaluation, lack of real-time performance, etc., and achieve the effect of improving efficiency

Inactive Publication Date: 2021-04-02
NANTONG UNIVERSITY
8 Cites 2 Cited by

AI-Extracted Technical Summary

Problems solved by technology

The disadvantage of this method is that this method spends a lot of money but only stays at the level of collecting classroom surveillance video, and does not do a deeper level of classroom teaching quality evaluation
The disadvantage of this method is: the student violation detectio...
View more

Method used

First use a convolutional network to initially extract image features, then input to the double-parallel convolutional network and carry out follow-up processing, which is equivalent to merging the lower part of the double-parallel convolutional network into a convolutional network, to save calculation resource.
The threshold that is automatically set by the soft threshold residual block alleviates this cumbersome task of manually setting the threshold, and by the experimental results, the manual setting of the threshold requires certain professional knowledge, and the detection accuracy of the model using the threshold is lower than that using The detection accuracy of the thresholded model obtained by the soft-thresholded residual block. In addition, the soft-threshold residual block ensures that the threshold value of the soft-threshold function is positive, and within an appropriate value range, avoiding the situation that the output is all zero.
What FPN used is the idea of ​​image pyramid to solve the problem of the difficulty of small-scale object detection in the object detection scene, SSD utilizes the hierarchical structure ...
View more

Abstract

The invention discloses an improved Openpose-based classroom multi-person abnormal behavior and mask wearing detection method, and the method comprises the steps of detecting students with abnormal behaviors in a classroom through employing the improved Openpose and the position and position relation of all joint points in student postures and an SSD mask detection algorithm integrated with an FPN; reminding students timely to participate in class and feeding back results to teachers, and performing teaching reform and summarization after class. According to the method, the improved Openpose model, the multi-person mask detection model and the intelligent health module are utilized; abnormal behaviors of the students in class can be detected and fed back to the teacher in time, the students can be reminded to join the class, and in addition, the invention can also remind students sitting up for a long time to get up.

Application Domain

Data processing applicationsCharacter and pattern recognition +2

Technology Topic

Student engagementPhysical medicine and rehabilitation +2

Image

  • An improved Openpose-based classroom multi-person abnormal behavior and mask wearing detection method
  • An improved Openpose-based classroom multi-person abnormal behavior and mask wearing detection method
  • An improved Openpose-based classroom multi-person abnormal behavior and mask wearing detection method

Examples

  • Experimental program(1)

Example Embodiment

[0060]Next, the technical solutions in the embodiments of the present invention will be apparent from the embodiment of the present invention, and it is clearly described, and it is understood that the described embodiments are merely embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, there are all other embodiments obtained without making creative labor without making creative labor premises.
[0061]SeeFigure 1-27 The present invention provides a technical solution: a modified OpenPose classroom multiplayer behavior and mask wearing detection method, including the following steps:
[0062]S1, through the camera front end and the back end camera, continuously shooting the image when the student is classified at a certain initial frame rate;
[0063]S2, class at the stage, the camera is normal to detect the classroom. By identifying the face of each person to determine whether the classmate is wearing a mask, once the student is not wearing mask, record the location of the classmate, feedback to the teacher;
[0064]S3, judging the abnormal behavior method: Identify the key points of each frame of the test image and numbered it, connect; pick out the upper body, left and right ear, nose, neck, left and right wrists, left and right elbows, left and right shoulders 12 A key point performs comparison comparison between different frames, depending on the coordinates of the key point exceeds a certain threshold or the angle of the connection between a threshold or the connection of the connection exceeds a certain threshold. The abnormal state is specifically divided into: Status 1: Mainly defined for a long time Keep a posture does not move; state 2: Mainly defined as the main body of the student before and after the left and soon swings; state 3: Mainly defined as left and right: Status 4: Mainly defined as long-term low, take the general situation as an example At the beginning, the system will compare the image every 5 seconds. If the value of each connected value does not reach an abnormal situation, it will be determined to be normal in the course state. If the positional relationship between the critical points occurs, the distance between the key connections The threshold will increase the detection frequency to compare detection every 1 second;
[0065]S4, the classroom class, the camera is equipped with a signal transmitter and continuously detects that the students have no abnormal behavior. Once some students have an abnormality, the infrared remote transmit / receive chip in the transmitter passes internal operations, outputs the corresponding signal, and then drive wireless The transmit module transmits the signal to the speech module and the teacher's server;
[0066]S5, install a vibration module at the bottom of the student desk, this vibration module consists of a signal receiver, a vibrating generator, a voice module, a close-to-direction sound collector, two fixed mounting, and the vibration module main body In both A, B, the B-surface is installed in contact with the desktop surface, and the A is used for the sound, and when the signal receiver of the vibration module receives the student's abnormal class signal, the wireless transmit module will be sent to the received signal. Infrared remote transmitting / receiving chip, after processing to the output terminal, control the vibration module work, the vibration module produces a few second vibration, reminds students to earnestly class, when the signal receiver in the vibration module receives the system sent for a long time reminder When the signal is signal, the voice module will be launched to remind the students to go out, the priority of the two reminders is: vibration reminder takes precedence over speech reminder, that is, the high priority reminder method does not generate, continue to take the next level of reminder the way;
[0067]S6, the camera also detects the mask to wear when the student is in class. Once a student is detected to take a mask or mask to wear irregularity, the system will automatically adjust the focal length of the camera, lock the student and save this frame image, while marking the student Abnormal behavior of class;
[0068]S7, install a thin film pressure sensor on the desktop to assist in detecting the student's classroom behavior attitude, after excluding the pressure of the item, if the average pressure value of the student generates pressure is exceeded, it can enter suspected Abnormal status; the system automatically adjusts the focal length of the camera or calls the camera to focus on observing the behavior attitude of this student until it returns to normal state or is determined as an abnormal state;
[0069]S8, the vibration module contains a near-distance directional sound collector to assist the students' classroom behavior. After excluding noise interference, if it detects that the area has a student talk, it can enter a suspected abnormal state; Adjust the focal length of the camera or call the back camera to focus on the student's classroom behavior of the area until it returns to normal state or is determined as an abnormal state;
[0070]S9, the intelligent health module is installed, and the system initialization can be initialized, including timer, interrupt, etc. Initialization, the interrupt program is used to set the long-sitered time threshold, the signal receiver receives the control signal transmitted by the total control system, when the pressure on the stool When the sensor reaches a predetermined threshold, it is determined that the student enters the sitting posture, starts the start timer, when the set time long threshold is reached, the vibration module will be shocked, and if the student is detected during the vibration reminder, For the alarm, the timer is cleared, and then returned to a regular state; if the student status has no change, it will enter the voice alarm state, remind the long-lasting person to stand up and soak the body;
[0071]S10, placing a voice module in the podium, internal and external structures such asFigure 20 As shown, when the camera detects that the student is abnormal, the signal transmitter issues a signal. The Bluetooth module on the voice module receives this signal, and the sound is sent by the power platen controlling the speaker, reminding the teacher that abnormality in the teacher;
[0072]S11, introducing search and optimization identification strategy, if a student often occurs a suspected diagnosis of a suspected abnormal behavior state, the next abnormal identification, reducing the time required to determine the abnormal behavior attitude, priority to determine whether it is a certain class abnormal state, In turn improves the efficiency of analysis;
[0073]S12, after each class, the student video stream determined to be an abnormal behavior is evaluated, whether the normal behavior is identified as an abnormal behavior, and whether abnormal behavior is not identified, and randomly extract a small video stream after class, reconfers To determine the accuracy.
[0074]Due to the modern classroom, there is a small classroom space, the number of students, and the intensive factors of students are intensive. The camera shooting video may occlude, blur, etc., is currently prepared:
[0075]Camera installed in the backend of the classroom B:
[0076]When the screen of the current-end camera A is blocked, the camera B mounted in the back end of the classroom can be automatically called to assist in judging the student's classroom behavior attitude.
[0077]The OpenPOSE model is essentially a double harmonic web model. It simultaneously uses two convolutional networks, one convolution network locates the location of the human key location in the image, and the other convolution network is used to connect the candidate key to constitute the limb, and then summarize the results of these two convolutional networks. Pose assembly to complete the detection of human gestures in the image.
[0078]First, an image feature is initially extracted using a convolution network, and then input to the dual parallel web, respectively, and equivalent to the partial portion of the two parallel volume network into a convolution network to save computational resources.
[0079]figure 1 Show the detection process of the OpenOpose model,figure 2 It is visualization of the process. First, use the VGG-19 network to extract the underlying feature of the input image, such asfigure 2 (a) is the output of the fifth layer convolution layer; then, the underlying feature is input to two parallel volume networks, one of which is used to generate a confidence map using a non-polar large value suppression algorithm (non-maximumsuppression) , Used for positioning of the human key part, such asfigure 2 (b), it is a confidence thermogram of shoulders and elbow parts during the process; another network uses local area affinity vector field algorithms, providing a connection method between key sites to form limbs, such asfigure 2 (c), it is the detection effect of the left calf in the process; then, the processing result of these two convolutional networks is summarized, and the Hungarian Algorithm is used, and the posture assembly is performed, and the human body pose in the image to be tested. ,Such asfigure 2 (d), (e) shown, respectively, the gesture assembly schematic and final detection result.
[0080]The main algorithm of the OpenPose model includes:
[0081]1) Partial regional pro and algorithm
[0082]The role of the local area affinity algorithm is to calculate the confidence of the candidate critical sites to constitute a candidate limb.
[0083]Set J1, J2 is two different key parts, V is unit vector from J1 to J2, VTThe vertical vector of the vector V, the length and width of the limb C is set to Lc, Σc, The position coordinates of the candidate key parts J1, J2 are respectively, respectively.
[0084]First, as shown in the formula (1), a confidence vector of a point P on the limb C is calculated.c(p). If Point P is on the limb C, then ac(p) is equal to the unit direction vector V of the limb C, otherwise zero vector. Equation (2) and (3) are formulas that determine if the point P is on the limb C, and if it is an all set, point P is on the limb C, otherwise it is not.
[0085]
[0086]
[0087]
[0088]Thereafter, the formula C is confirmed by the formula C.cIt is the calculation of the calculation of the set of confidence vectors of all points on the connection between the key parts J1 and J2.
[0089]
[0090]2) Hungarian algorithm
[0091]Because there are many people on the image, from these candidate components can be defined as a large bit possible limb binding, by the above integration formula, calculating the score obtained by each candidate limb. Therefore, the OpenPOSE model uses the largest two-way map matching method to find a possible gesture assembly scheme, then look for the maximum number of candidate limbs.
[0092]The Hungarian algorithm is an algorithm for the OpenPose model to complete the maximum two-point map. Assume that there are three key parts, which may be connectedimage 3 Indicated. The calculation process of Hungarian algorithm is as follows:
[0093]First, since the maximum two-point graphics is not allowed to share a node in multiple sides (ie, a shoulder is not possible to connect two elbows), there are two constraints, such as equation (5), (6).
[0094]
[0095]
[0096]Equation (5) meaning: set up Represents the possibility of connecting the 1-type N-key part of the part and the portion of the 2-class m-critical part of the part, and all of the 1-class n-key parts, all parts of the key part, all of which are connected to this key part. The sum of the confidence must not exceed 1. Otherwise, the 2-class key parts that represent the part of this key part have exceeded one, which is illegal. Equation (6) is also constraints on the 2-class m-key part of the part.
[0097]Finally, all possible ways to satisfy the constraints are used, using the formula (7) to find the maximum integration and, the connection method corresponding to the most likely connected way, finds the two ends of a certain limb The key part, repeating the above steps for other key parts, and finally assemble the limbs of the shared part of the same key part, and the detection of multi-person postures is achieved.
[0098]
[0099]Apply improved OpenPose models for multiplayer behavior detection, including:
[0100]1. Resolve the problem of occlusion in the target detection algorithm
[0101](a) use GAP and CAM instead of the latter layer full connection layer
[0102]The leak detection that needs to be detected by other objects can be improved from the feature angle.
[0103]Convolutional neural networks are sensitive to certain features of the image target, in the convolution feature space, the class-dependent feature determines the final classification results.
[0104]Thus, we can use the characteristics of the feature in the convolutional feature, that is, the class-dependent feature of the recommended area in the VGG-19, generating an occlusion effect.
[0105]For a certain class of CAM, it represents a certain number of images and features in the CNN as the discrimination basis of this class, that is, the class dependent section, which explains the model to divide the target into a certain class. After the input image passes through a series of convolution layers, the last layer feature is obtained, and the last layer feature includes rich space and language information, and the usual network will select the full connection layer to perform picture feature map to the feature vector. Space information will be lost during this conversion. The mean of each feature layer is obtained by GAP (Global Average Pooling), and then the mean of these features is weighted and entered to the last SoftMax in the last layer of all-connection, namely:
[0106]
[0107]Among them, ωkcThe weight corresponding to the feature value K represented by each category C.
[0108]Next, a method of dependency feature in the positioning convolution feature is explained: the input image passes through a series of feature extraction and network screening to obtain a suggested area. GAP is blended into the VGG-19 network that has been trained, that is, adding GAP after the ROI POOLING layer, the suggested area feature of the fixed size is input to the GAP to obtain the Class-dependency section of the VGG-19 recommended area feature.
[0109]The parameters of the VGG-19 part are fixed during the training process, and the GAP section is trained as a classifier. However, since there is only one layer of all-connect layer, there is a phenomenon in which training is not fitted. For this purpose, two convolutionary nuclear sizes are 3 × 3 and 1 × 1 convolution layer in front of the GAP section. Training. After the training is completed, the weights belonging to each category are extracted and the CAM is obtained with the corresponding feature graph. The highlighted part of the CAM is a class dependency feature.
[0110](b) Add adaptive soft threshold residual network extraction underlayer
[0111]The soft threshold processing is set to zero the absolute value of less than the threshold, and the value greater than the threshold is reduced toward the zero direction, so that the useless information is filtered. Therefore, in the Attention Mechanism (Attention Mechanism), the maintenance method is obtained. The original SENET is weighted for each channel of the feature map, and the useful feature channel can be enhanced in this way, weakening the redundant feature channel . The soft threshold residual block is replaced with "rendering" in the SENET in the residual mode to "soft threshold", add a branch to the residual block to obtain the threshold. Specifically, the threshold output from the soft threshold residual block is (the average value obtained by adding the pixel value of each point on the feature layer of each channel) × (coefficient between a set of 0 and 1).
[0112]The threshold of automatological settings is automatically set by soft threshold residual blocks, and the artificial setting threshold is reduced, and a certain expertise is required by the experiment. The detection accuracy of the model of the model is lower than the use of soft threshold. The detection accuracy of the model of the threshold obtained by the difference. In addition, the soft threshold residual block ensures that the threshold of the soft threshold function is positive, and the case where the output is all zero is avoided within the appropriate range of values.
[0113]Such asFigure 20 As shown, it is an adaptive soft threshold value and residual block structure, and a branch acquisition threshold is taken out after two layers of convolution obtains a feature pattern of W × H × C.
[0114]The branch first performs global average cellification of W × H size, and the method is as shown in formula (9):
[0115]
[0116]In the formula, W, H is the width and high of the input feature layer; | XIJ | Is the absolute value of the corresponding pixel point in the characteristic image (i, j); Y is a pool of generation, which is a 1 × 1 × C vector.
[0117]Subsequently, the poolization result is to learn the threshold coefficient through a layer of 1 × 1 volume, and then use the SIGMOID function to normalize the coefficient, which is between 0 and 1, which is the same as 1 × 1 × C vector. The threshold coefficient is multiplied by the pool of the pool, and the different thresholds corresponding to each channel in the feature map can be obtained.
[0118]Finally, the feature map is soft threshold according to the obtained threshold.
[0119]2. Using the minimum distance discrimination analysis carefully listening
[0120]According to one of the claims 1 to wear and abnormal behavior than Class Based mask Openpose improved detection method further comprises the step S3: For structures 12 individual bodies extracted coordinate points in the image, i.e. image the human eye The shoulder and other positions, the 12 coordinates are indicated by dotted vectors:
[0121]A = (x1Y1), ..., (x12Y12))) (10)
[0122]For each part of the detected portion, will be (x1Y1As the starting point, the remaining 11 points coordinates, the starting point is differential, then calculate the remaining 11 coordinate points and the starting point of the angle value to obtain the feature matrix:
[0123]
[0124]This feature matrix can be used to describe the shape information of the detection portion.
[0125]Each column of the feature is considered as an attribute of the graphical shape. The value of each column has different dimensions, and each column data must be standardized to obtain a standard feature matrix:
[0126]
[0127]Standardization process formula:
[0128]
[0129]3. The standard feature matrix is ​​evaluated by the entropy value method, and the invalid information with high entropy value is discarded.
[0130]Calculation method and step of entropy value method:
[0131]1) Assuming to evaluate the M objects, the evaluation indicator system includes N indicators. This is made up of M samples, using N indicators to make comprehensive evaluation issues, the initial data matrix of the evaluation system can be formed:
[0132]
[0133]Where XIJ Indicates the value of each sample evaluation index.
[0134]For the feature matrix, there are three indicators, respectively, the coordinate difference between the X direction, the Y direction, and the angle value, and there is a M feature matrix, and the weight of the three indicators can be determined by the entropy value method.
[0135]2) Data processing - standardization processing
[0136]a) Due to the dimension of each indicator, the order of magnitude has a gap, so in order to eliminate the difference in evaluation results, each indicator is required to standardize.
[0137]Standardization process formula:
[0138]
[0139]Where XjTag value value, xMax The maximum value of the J Item, XMIN The minimum of the J Item, X 'IJ It is a standardized value.
[0140]If the indicator used is the benefit index, the previous formula is used.
[0141]If the indicator used is a cost type, the latter formula is selected.
[0142]For the X direction coordinate difference and the absolute value of the Y direction coordination, these three item attributes are costly, so the second standardization process is required.
[0143]b) Calculate the proportion of the target value of each object under the J IT indicator yIJ
[0144]
[0145]Thus, the specific gravity matrix of data can be established y = {yIJ }M * N
[0146]3) Computational indicator information entropy E and information utility value D
[0147]a) Calculate the formula of the information entropy value of the JUNT indicator:
[0148]
[0149]b) Information utility value of a certain indicator depends on the information entropy values ​​of the indicatorjThe difference between 1, its value directly affects the size of the weight, the larger the information utility value, the greater the importance of the evaluation, the greater the weight.
[0150]dij= 1-ej (18)
[0151]4) Computing evaluation index weight
[0152]Estimating the weight of each index, its essence is to use the value of the coefficient indicative interest rate is calculated, the higher the value of the coefficient, the greater the importance of the evaluation (also known as entropy method using the weights greater contribution to the results of the evaluation of the Big).
[0153]The weight of the JUNT indicator is:
[0154]
[0155]At this point, the weight of the three indicators can be obtained, weighted its score, can obtain the integrated score, measure the importance of the coordinate difference and the angular difference, and metrics importance.
[0156]This algorithm is applied to class behavior identification, and whether the extracted picture information has high effectiveness, the greater the information entropy value, the lower the effective value of the information, the higher it.
[0157]It is now judged using the minimum distance method and entropy value method for the three sets of coordinate sequences:
[0158]A = [12, 24; 5, 9; 20, 7; 14, 9; 2, 8; 16, 32; 8, 16; 17, 42; 62, 14; 34, 49; 12, 63; 20, 14 ]; Representing the central sequence, that is, the sequence is used to simulate a given serious listening state.
[0159]b1 = [11,20; 45,69; 20,17; 14,79; 2,18; 86,32; 18,16; 37,62; 52,34; 44,29; 15,63; 20,19 ];
[0160]b2 = [17,2; 54,19; 23,17; 18,19; 12,38; 46,32; 38,16; 27,41; 52,14; 36,52; 19,33; 26,4 ];
[0161]The B1 and B2 matrices are comparative sequences, which represent other images to be compared;
[0162]By the entropy value method, the value of the feature matrix of B1 regarding A.254, 0.343, 0.403, the total distance is 26.47; B2 About A Characteristic Matrix three columns of the corresponding attribute weight of 0.240, 0.451, 0.310, the total distance is 24.18, by comparison distance can derive the picture represented by B2 than the picture represented by B1 is more like a situation.
[0163]4. Various algorithms to find a suitable keyframe picture
[0164]According to one of the claims 1 detection method based on step wearing masks and abnormal behavior than Class improved Openpose S1 of further comprising: in this experiment because of the high sampling rate Openpose improved, a large amount of data, which also Contains many invalid data, increasing complexity for other processing of data, so it is necessary to screen redundant data and invalid data in raw data as much as possible, while extracting a general key frame to represent the original role motion Behavior and can be unhappy. Get the easiest way to get keyframes is equal interval extraction, but this method may result in an undermata of high-speed motion, ie, key frames are lost, and over-sampling of low-speed motion, ie keyframe redundancy. Therefore, according to different theoretical methods and different moving features, more complex algorithms should be delivered. This article selects a suitable frame picture by the following two algorithms.
[0165](a) frame decrease method
[0166]Set a threshold, according to the threshold and use the four-dimensional interpolation reconstruction method to extract the keyframes directly or directly. Reconstruction of the original sequence is restored using the four-dimensional interpolation, which is in line with the characteristics of the human joint rotation movement, so that the extracted key frame is relatively accurate.
[0167]Linear Interpolation (LERP) method:
[0168]Qit= LERP (Q0, Q1, t) = (1-t) q0+ TQ1 (20)
[0169]Normalized Linear Interpolation (NLERP) method:
[0170]
[0171]Spherical Linear Interpolation method:
[0172]
[0173]θ = ACOS (Q0Q1) (twenty three)
[0174](b) K-Means (K mean) clustering algorithm extracts keyframes
[0175]Algorithm principle:
[0176](1) First we choose some class / group and randomly initialize their respective center points. The center point is the same position as each data point vector length. This requires us to predict the quantity of the class in advance (the number of center points).
[0177](2) Calculate the distance of each data point to the center point, which is divided into which centers are divided into.
[0178](3) Calculate the central point in each class as a new center point.
[0179](4) Repeat the above steps until each category changes in every iteration. You can also initialize the center point multiple times and then select one of the best run results.Figure 6 Demonstrate K-Means for classification:
[0180]Algorithm step:
[0181]Handling a video stream captured by a camera, divided into normal behavior and abnormal behavior, student's normal behavior is detected once, and an abnormal behavior is detected once 1 second. The clustering method is weighted by the set threshold to cluster each frame, and select the first frame of each cluster as a critical frame. This can find the number of frames frequent frequently.
[0182]5. Detailed description of the decision method of four abnormalities. Status 1: There are two or more key points of the five key points of the students and the left and right wrists and left and right elbows. The position changes in the frame image of the two-consecutive frame images are less than a certain threshold, that is, the abnormal state is defined. 1. Status 2: When the student's left and right wrists and left shoulders or left shoulders are changed more than a certain threshold in the frame image of the continuous twenty-recognized frame, that is, the abnormal state 2 is defined. State 3: Two key points of the left or right side of the student face and the two key points are lost in the frame image of a continuous five-way identification. Status 4: When the location of any key point in the five key points of the frame image surface of the fifteen identification is lower than the threshold, it is defined as an abnormal state 4. In addition, if there is more than half of the students at the same time, it is not recorded, which may be a class in the classroom.
[0183]6. The state determined by the pressure sensor is now set to the suspected abnormal state I, set the state determined by the directional sound collector to a suspected abnormal state II, when the student is determined to enter any one of the two suspected abnormalities The system automatically focuses on the student or the film area until the suspected abnormal state is released or determined to be a class abnormality, generating a corresponding vibration reminder and feeds the abnormal data to the teacher.
[0184]7. The sound collector consists mainly of the coil, magnet, and housing. When the acoustic wave signal is received, the force generated by the sound wave acts on the diaphragm, causing the vibration of the diaphragm to vibrate the vibration, and the voice coil is in magnetic steel. Movement, generating an electric force, the sound signal transitions into electrical signal transmission to the next system.
[0185]8. Usually, the human body posture detected by the improved OpenPose model is a human body posture, and many exercises do not need to detect people's systemic postures in the actual operation of the general classroom. For example, in classroom testing, the focus of action is the upper body. Therefore, when the classroom detection system is detected, only the key points of the upper body can be detected, that is, the amount of operation can be significantly reduced, and the detection speed can be further accelerated, so that the real-time performance is detected.
[0186]Multi-population detection model features include:
[0187]1. Add the residual network on the existing SSD model
[0188]In this paper, the pre-training RESNet 18 is used as the backbone network for feature extraction, and the remaining 4 layers are constructed from the residual connection unit except that the first layer is used in 7 × 7 convolution. The use of residual connection can effectively solve the problem of gradient disappearance or gradient explosion when deep network training, and the internal structure of the residual connection unit is likeFigure 7 Indicated.
[0189]In the residual connection unit, the calculation formula established by the output feature vector Y through the residual connection for the input feature vector X.
[0190]Y = σ (f (x, {w)i}) + x) (24)
[0191]Among them, σ represents the RECTIED LINERA Unit, the RELU) activation function, WiRepresents weight, f (x, {wi}) Indicates the residual mapping, and for the three-layer residual connection unit, the calculation method is as shown in the formula (3). The addition operation is added by the shortcut connection and the election, and then the RELU activation function is used to nonlinearization.
[0192]f (x, {wi}) = W3σ (W2σ (W1x)) (25)
[0193]The effects after adding the residual networkFigure 8 As shown, the model recognition effect after the addition of the residual network is superior to the normal VGG-19 model.
[0194]2. Add FPN "Feature Pyramid" network to extract the underlying feature
[0195]FPN uses the image of the image pyramid to solve the problem of difficulty in detection of small-size objects in the object detection scenario. SSD uses the hierarchy of the convolutional network, gave multi-scale Feature Map through different layers of the network.Figure 10 Although the method can improve accuracy and basically no test time, but does not use a lower layer of FeatureMap, but these low-level features are very helpful for detecting small objects.
[0196]For these problems, the FPN uses the form of Feature Map in SSD pyramids. Unlike SSD, FPN not only uses a deep feature map in VGG, but also in the shallow Feature Map is also applied to the FPN. These Feature Map efficiently integrated through bottom-up, STTOM-UP, LateralConnection, and there is no significant increase in detection time while lifting accuracy.Figure 10 ).
[0197]By the upward path, FPN got four groups of Feature Map. Shallow Feature Map contains more texture information, while deep Feature Map contains more semantic information. In order to combine these four groups of Feature Map, FPN uses a strategy of the top down and lateral connection, such asFigure 11.
[0198]In order to improve the calculation efficiency, first, the FPN is defined using 1x1, obtaining P5, and then sampled using a double linear interpolation, and samples the same size on the P5 to C4. Thereafter, the FPN also reducesd the P4 using 1 × 1 volume of P4. Since the designer does not change the size of the size, P5 and P4 have the same size, and the FPN directly adds the P5 unit to P4 to obtain the updated P4. Based on the same strategy, we update P3 and P3 update P2 using P4. This whole process is to start updating from the top of the network, so it is called the top-top path.
[0199]FPN uses unit plus operation to update features, which is called lateral connection. Since the unit is used, P2, P3, P4, P5 should have the same number of Feature Map, so the FPN uses a 1 × 1 volume to reduce the design.
[0200]After the FEATURE MAP is updated, the FPN has received a 3 × 3 convolution operation after P2, P3, P4, and P5, which is to mitigate the aliasing effects of the upper sampling (AliaSing Effect).
[0201]3. Experimental effect tables joined to RESNET18 network
[0202]The evaluation criteria uses Ar (Average Recall), 100 or 1K in Ar The upper right corner of Ar, and each image has 100 or 1000 anchor, Ar, the lower right corner S, M, and L indicates that the size of the Object in the COCO data set is small, medium, large . The large bracket {} of the Feature column indicates that each layer is independently predicted.
[0203]From (a) (b) (c) comparison, it can be seen that the role of FPN is indeed obvious. In addition, (a) and (b) contrast can be seen that the high-level feature is not effective than the low layer. (d) means that only the transverse connection is there, and there is no auto-down process, that is, only the results of each layer of the bottom upward, a 1 × 1 lateral connection and 3 × 3 convolution to obtain the final result. (e) Indicates that there is a top-down process, but there is no transverse connection, that is, the down process is not integrated with the original characteristics. This effect is not good because the target's Location feature becomes more inaccurate after multiple mining samples and upper sampling processes. (f) The Finest Level layer is used to predict, that is, the characteristics of multiple characteristics sampling and fusion to the last step for prediction, mainly to prove the expression ability of the pyramid layering independent prediction. Obviously, the effect of the Finest Level is not as good as FPN, because the PRN network is a window size fixed sliding window detector, so the sliding of the pyramid can increase the robustness of the scale. In addition (f) has more Anchor, indicating that the number of ANCHOR does not effectively improve accuracy.
[0204]In addition, the information gain rate in the C4.5 algorithm is used to determine which discrimination method has higher accuracy. The specific ideas are: Several different discriminant abnormal methods are selected as the main feature classification, and then evaliate the classification results.
[0205]Student abnormal behavior discriminantFigure 21 Down:
[0206]Take a low head as an example: set Yes to bow, NO is not low: for normal state students, proportion of abnormal state students
[0207]
[0208]
[0209]Condition entropy is defined as:
[0210]The information gain is defined as: g (d, a) = h (d) -h (d | a) (27)
[0211]The intrinsic information is defined as:
[0212]The information gain rate is defined as:
[0213]
[0214]Intrinsic information is:
[0215]Set Ei= H (y | a = aiWhere i = 0, 1, Ai= A0, a1The two judgment characteristics correspond to the unopened and heading.
[0216]
[0217]
[0218]H (D | a) = 0.182
[0219]The information gain is:
[0220]g (d, a) = h (d) -h (d | a) = 0.301-0.182 = 0.119
[0221]The information gain rate is:
[0222]In the same way, the information gain ratio of the rotating judge, the time of expression, and the mask judgment can be known that whether or not the mask determines whether the student is an abnormal behavior is the highest. Identify the judgment priority to select the mask detection, and the lowest use of the student is the lowest accuracy of whether the student is extremely abnormal, then less use.
[0223]The method of the present invention utilizes an improved OpenPOSE model, multi-human population detection model, but also can detect that the students' abnormal behavior is promptly feedback to the teacher and can remind students to invest in the classroom. In addition, they can get up. Activity reminder.
[0224]The present invention aims to compensate for the lack of lack and insufficient prior art means, providing an improved ovenpose classroom abnormal behavior and mask wearing detection methods. This method uses improved openpose to the location and location relationship of each joint point in the student posture, and the SSD mask detection algorithm integrating FPN, detects classroom abnormal behavior students, and reminds students to participate in the classroom and feedback results to the teacher, after class Teaching reform and summary.
[0225]While the embodiments of the invention have been shown and described, these examples may be made to make a variety of changes, modifications, and replacement without departing from the principles and spirit of the present invention, without departing from the spirit and spirit of the present invention. And variations, the scope of the invention is defined by the appended claims and their equivalents.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Compositing Windowing System

InactiveUS20100058229A1minimal costimprove efficiency
Owner:QUALCOMM INC

Image reading apparatus

InactiveUS20050238205A1improve efficiency
Owner:FUJIFILM BUSINESS INNOVATION CORP

Method and device for updating data in distributed storage system

ActiveCN103294675AImprove efficiencySolve data consistency
Owner:SHANDA INTERACTIVE ENTERTAINMENT

Classification and recommendation of technical efficacy words

  • Improve efficiency
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products