[0047] Example 1
[0048] see figure 1 , which shows the human behavior monitoring method in the first embodiment of the present invention, including steps S11-S16.
[0049] S11. Acquire a target video at a first preset frequency, where the target video includes several video frames, and perform semantic segmentation on the video frames to obtain ground position data and human body position data.
[0050]The human behavior monitoring method in this embodiment can be applied to the monitoring of patients in the hospital and the monitoring of the elderly at home. The camera obtains the real-time behavioral feature video of the ward according to the first preset frequency, and judges whether the ward's real-time behavioral feature conforms to the behavior of falling, convulsing or alarming behavior, so as to alarm and improve the safety of the ward. First, obtain the target video of the supervised person at the first preset frequency, the target video includes the supervised person and the surrounding environment, the first preset frequency can be 1-2min/time, the frequency of video processing can be adjusted, according to the adjusted frequency Detect the behavioral characteristics of people in the video. The target video is divided into several video frames, and the video frames are semantically segmented by Mask2Former. The semantic segmentation is to label each pixel in the image with a class label, and each class in the image is labeled with a color. In this embodiment, the human body position area and the ground position area in the video frame are marked with different colors through semantic segmentation, so as to obtain the human body position data and the ground position data.
[0051] S12 , acquiring feature data of the human body position data, where the feature data includes center position data of the two crotch of the human body and limb data.
[0052] Then, the feature data of the human body is obtained according to the human body position data, and the feature data includes the center position data and limb data of the two spans of the human body. The data of the center position of the two spans of the human body can be approximately determined as the data of the center of gravity of the human body, which is used to judge whether the human body falls.
[0053] The limb data is the key points of the limbs of the human body, which are used to judge whether the human body is twitching. Use OpenPose (body language recognition system) to detect key points of limbs in video frame pictures. Suppose the hand key point of the left arm of the human body is g[0] , the key point of the left arm elbow is g[1] , the key point of the left arm shoulder is g[2] , the coordinates of the detected limb key points are [gx0,gy0], [gx1,gy1], [gx2,gy2].
[0054] Further, the ground position data includes a large-scale ground area. When judging the behavior characteristics of the ward, in order to reduce the target detection range, the human body position data is taken as the center, and the ground position data is determined to be at a preset distance from the human body position data. The target ground position data in the , optional, the preset distance can be set to a ground area of 2m around the ward.
[0055] S13, judging whether the standard deviation of the displacement of the limb data in the continuous multi-frame video frame is greater than the preset standard deviation;
[0056] If the standard deviation of the displacement of the limb data in the consecutive multiple video frames is greater than the preset standard deviation, it is determined that the human body is in a state of convulsion, and step S15 is executed.
[0057] After acquiring the limb data, further acquire multiple consecutive video frames in the target video, assuming that A[i] is the limb key point detected in each video frame, such as the endpoint of the limb, i is greater than or equal to 0 and less than or equal to 3, its The position is [xi,yi]. The standard deviation formula for calculating the displacement of limb data is as follows:
[0058]
[0059] Among them, n represents consecutive n video frame pictures, and di represents the standard deviation of the displacement change of the limb key point marked i in the consecutive n pictures. By comparing the magnitude of di with a user-defined threshold, it can be judged that the key points of the limb move back and forth in multiple consecutive video frames, so as to judge that the limb position marked i has a twitching behavior, that is, the human body is in a twitching state.
[0060] Optionally, in order to improve the accuracy of judging the state of human body convulsions, when it is judged that the standard deviation of the limb data displacement in consecutive multiple video frames is greater than the preset standard deviation, the facial expression can be further recognized.
[0061] Specifically, the facial expression model is pre-trained by using the facial expression data set in a dangerous state such as convulsions, fright, pain, etc., the facial data of the guardian is obtained in the video frame, and the facial data is input into the pre-trained facial expression model. , judging whether the facial expression conforms to the preset facial expression in the above model. If so, it can be judged that the human body is in a state of convulsions according to the facial expression of the ward being in a state of pain and the continuous reciprocating movement of key points of the limbs.
[0062] S14, judging whether the current height data between the center position data of the two crotch of the human body and the ground position data is lower than the preset height;
[0063] If the current altitude data is lower than the preset altitude, step S16 is executed.
[0064] Record the center position data of the two spans of the human body in the video frame, and perform height detection with the target ground position data, in which the target ground position data can be approximated as a horizontal plane, and then calculate the difference between the center position data of the two human body spans and the target ground position data. The current altitude data between the two spans, and the distance between the center position of the two spans and the ground. When the current height data is lower than the preset height, and the center position of the two spans is closer to the ground, it can be determined that the human body may be in a falling state, and further judgment of the falling state is required.
[0065] Further, obtain the position data of the foot (the end point of the foot) in the human body position data, connect the foot data and the center position data of the two spans to form a straight line in the horizontal position, and obtain the slope of the straight line through the two points and the ground position data, When the slope of the straight line is lower than the preset slope, it can be determined that the human body may be in a falling state.
[0066] In the same way, obtain the position data of the head (the center of the head) in the human body position data, connect the key points of the head and the center position data of the two spans into a straight line, and calculate the slope of the straight line. When the slope of the straight line is lower than the preset slope , it can be determined that the human body may be in a fall state.
[0067] S16, determine whether the height difference between the historical height data in the historical video frame and the current height data is greater than the preset height difference;
[0068] If the height difference is greater than the preset height difference, step S15 is executed.
[0069] When it is determined that the center position data of the two spans of the human body is close to the ground position data, the historical data is extracted from the historical video frames, and the previous video frame in the adjacent preset time period is obtained, for example, the history before 1S-2S is obtained. The video frame, correspondingly extracts the historical height data of the center position data of the two spans of the human body and the target ground position data in the historical video frame, and calculates the height difference between the historical height data and the current height data. When the height difference is greater than the preset height difference, it means that the center position data of the two spans of the human body drops greatly in a short period of time, so it can be determined that the human body is in a falling state.
[0070] S15. Send an alarm.
[0071] When it is determined that the human body is in a fall state, a fall alarm will be issued. Specifically, the alarm method can be set according to the needs of the ward, such as making an emergency call, contacting an emergency contact or sounding an alarm, so as to provide timely assistance to the guardian.
[0072] If the calculated standard deviations of the displacements of the multiple limb positions are all lower than the preset standard deviation in multiple consecutive video frames, it is determined that the positions of the multiple limb key points are convulsing, that is, it is determined that the human body is convulsing. When it is determined that the human body is in a state of convulsion, an alarm is issued, and the specific alarm is similar to that of the fall alarm.
[0073] In some other optional embodiments, the method for triggering an automatic alarm further includes:
[0074] Obtain the hand key point data in the human body position data from the human body position data, and compare the obtained hand key point data with the preset help gesture to determine whether it conforms to the preset help gesture, and the preset help gesture can be customized . When the ward wants to send out a distress signal, he can make a preset help gesture to the camera. When it is determined that the key point data of the hand conforms to the preset help gesture, an alarm is issued, such as making an emergency call or contacting an emergency contact.
[0075] In some other optional embodiments, the method for triggering an automatic alarm further includes:
[0076] Obtain a target video, the target video includes multiple video frames, use a face detection algorithm to obtain a face picture from the video frame, and perform facial expression recognition on the obtained face picture. Facial expression recognition includes the recognition of convulsive, frightened, and painful expressions on the ward's face. Train a facial expression classification model using the twitching, frightened, and distressed expression datasets. Read the video frame from the camera, obtain the face picture, and then send it into the face expression classification model to obtain the recognition result of the face. If a twitching, frightened or distressed expression can be recognized within a preset time, an alarm will sound.
[0077] To sum up, the human behavior monitoring method in the above-mentioned embodiments of the present invention obtains the video frame of the supervised person through the camera, and judges whether to fall or twitch by analyzing the human behavior characteristics of the video frame, thereby avoiding the discomfort and detection of traditional wearing sensors. inconvenient. Specifically, the ground position data and the human body position data are obtained by semantically segmenting the video frame, and then the center position data and limb data of the two spans in the human body position data are obtained. After setting the height, obtain the previous historical video frame in the adjacent time period again, obtain the historical height in the corresponding historical video frame, and further judge whether the height difference between the historical height and the current height is greater than the preset height difference, so as to determine the guardian. It is in a falling state; and whether the human body twitches is determined by calculating whether the standard deviation of the displacement of the limb data in consecutive multiple video frames is lower than the preset standard deviation, thus solving the inconvenient problem of wearing sensors to detect dangerous behaviors in the background art.