Human body behavior monitoring method and system and storage medium

A human body and behavior technology, applied in a variety of biometric applications, voice analysis, instruments, etc., can solve the problems of wearing sensors to detect danger, inconvenient behavior, etc., to solve the problem of inconvenient detection of dangerous behavior, avoid discomfort, avoid Detect inconvenient effects

Pending Publication Date: 2022-07-01
江西中业智能科技有限公司
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0005] Based on this, the purpose of the present invention is to provide a human behavior monitoring method, system and storag...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention provides a human body behavior monitoring method and system and a storage medium. The method comprises the steps that center position data and limb data of the two hips of a human body are acquired; judging whether the standard deviation of the limb data displacement in the continuous multi-frame video frame is greater than a preset standard deviation or not; and when the height between the central position data of the two crotches and the ground position data is lower than a threshold value, judging whether the height difference between the historical height data and the current height data in the previous preset time period is greater than a preset height difference, and if so, giving an alarm. According to the human body behavior monitoring method and system and the storage medium, whether the human body falls down or not is judged according to whether the height between the central position data and the ground position data is lower than the threshold value or not and whether the height difference between the current height and the height of the previous video frame is larger than the preset height or not; whether the human body is convulsed or not is judged by judging whether the standard deviation of the displacement of the limb data in the continuous multi-frame video frame is lower than the preset standard deviation or not, and the monitoring accuracy and convenience are improved.

Application Domain

Technology Topic

Image

  • Human body behavior monitoring method and system and storage medium
  • Human body behavior monitoring method and system and storage medium
  • Human body behavior monitoring method and system and storage medium

Examples

  • Experimental program(3)

Example Embodiment

[0047] Example 1
[0048] see figure 1 , which shows the human behavior monitoring method in the first embodiment of the present invention, including steps S11-S16.
[0049] S11. Acquire a target video at a first preset frequency, where the target video includes several video frames, and perform semantic segmentation on the video frames to obtain ground position data and human body position data.
[0050]The human behavior monitoring method in this embodiment can be applied to the monitoring of patients in the hospital and the monitoring of the elderly at home. The camera obtains the real-time behavioral feature video of the ward according to the first preset frequency, and judges whether the ward's real-time behavioral feature conforms to the behavior of falling, convulsing or alarming behavior, so as to alarm and improve the safety of the ward. First, obtain the target video of the supervised person at the first preset frequency, the target video includes the supervised person and the surrounding environment, the first preset frequency can be 1-2min/time, the frequency of video processing can be adjusted, according to the adjusted frequency Detect the behavioral characteristics of people in the video. The target video is divided into several video frames, and the video frames are semantically segmented by Mask2Former. The semantic segmentation is to label each pixel in the image with a class label, and each class in the image is labeled with a color. In this embodiment, the human body position area and the ground position area in the video frame are marked with different colors through semantic segmentation, so as to obtain the human body position data and the ground position data.
[0051] S12 , acquiring feature data of the human body position data, where the feature data includes center position data of the two crotch of the human body and limb data.
[0052] Then, the feature data of the human body is obtained according to the human body position data, and the feature data includes the center position data and limb data of the two spans of the human body. The data of the center position of the two spans of the human body can be approximately determined as the data of the center of gravity of the human body, which is used to judge whether the human body falls.
[0053] The limb data is the key points of the limbs of the human body, which are used to judge whether the human body is twitching. Use OpenPose (body language recognition system) to detect key points of limbs in video frame pictures. Suppose the hand key point of the left arm of the human body is g[0] , the key point of the left arm elbow is g[1] , the key point of the left arm shoulder is g[2] , the coordinates of the detected limb key points are [gx0,gy0], [gx1,gy1], [gx2,gy2].
[0054] Further, the ground position data includes a large-scale ground area. When judging the behavior characteristics of the ward, in order to reduce the target detection range, the human body position data is taken as the center, and the ground position data is determined to be at a preset distance from the human body position data. The target ground position data in the , optional, the preset distance can be set to a ground area of ​​2m around the ward.
[0055] S13, judging whether the standard deviation of the displacement of the limb data in the continuous multi-frame video frame is greater than the preset standard deviation;
[0056] If the standard deviation of the displacement of the limb data in the consecutive multiple video frames is greater than the preset standard deviation, it is determined that the human body is in a state of convulsion, and step S15 is executed.
[0057] After acquiring the limb data, further acquire multiple consecutive video frames in the target video, assuming that A[i] is the limb key point detected in each video frame, such as the endpoint of the limb, i is greater than or equal to 0 and less than or equal to 3, its The position is [xi,yi]. The standard deviation formula for calculating the displacement of limb data is as follows:
[0058]
[0059] Among them, n represents consecutive n video frame pictures, and di represents the standard deviation of the displacement change of the limb key point marked i in the consecutive n pictures. By comparing the magnitude of di with a user-defined threshold, it can be judged that the key points of the limb move back and forth in multiple consecutive video frames, so as to judge that the limb position marked i has a twitching behavior, that is, the human body is in a twitching state.
[0060] Optionally, in order to improve the accuracy of judging the state of human body convulsions, when it is judged that the standard deviation of the limb data displacement in consecutive multiple video frames is greater than the preset standard deviation, the facial expression can be further recognized.
[0061] Specifically, the facial expression model is pre-trained by using the facial expression data set in a dangerous state such as convulsions, fright, pain, etc., the facial data of the guardian is obtained in the video frame, and the facial data is input into the pre-trained facial expression model. , judging whether the facial expression conforms to the preset facial expression in the above model. If so, it can be judged that the human body is in a state of convulsions according to the facial expression of the ward being in a state of pain and the continuous reciprocating movement of key points of the limbs.
[0062] S14, judging whether the current height data between the center position data of the two crotch of the human body and the ground position data is lower than the preset height;
[0063] If the current altitude data is lower than the preset altitude, step S16 is executed.
[0064] Record the center position data of the two spans of the human body in the video frame, and perform height detection with the target ground position data, in which the target ground position data can be approximated as a horizontal plane, and then calculate the difference between the center position data of the two human body spans and the target ground position data. The current altitude data between the two spans, and the distance between the center position of the two spans and the ground. When the current height data is lower than the preset height, and the center position of the two spans is closer to the ground, it can be determined that the human body may be in a falling state, and further judgment of the falling state is required.
[0065] Further, obtain the position data of the foot (the end point of the foot) in the human body position data, connect the foot data and the center position data of the two spans to form a straight line in the horizontal position, and obtain the slope of the straight line through the two points and the ground position data, When the slope of the straight line is lower than the preset slope, it can be determined that the human body may be in a falling state.
[0066] In the same way, obtain the position data of the head (the center of the head) in the human body position data, connect the key points of the head and the center position data of the two spans into a straight line, and calculate the slope of the straight line. When the slope of the straight line is lower than the preset slope , it can be determined that the human body may be in a fall state.
[0067] S16, determine whether the height difference between the historical height data in the historical video frame and the current height data is greater than the preset height difference;
[0068] If the height difference is greater than the preset height difference, step S15 is executed.
[0069] When it is determined that the center position data of the two spans of the human body is close to the ground position data, the historical data is extracted from the historical video frames, and the previous video frame in the adjacent preset time period is obtained, for example, the history before 1S-2S is obtained. The video frame, correspondingly extracts the historical height data of the center position data of the two spans of the human body and the target ground position data in the historical video frame, and calculates the height difference between the historical height data and the current height data. When the height difference is greater than the preset height difference, it means that the center position data of the two spans of the human body drops greatly in a short period of time, so it can be determined that the human body is in a falling state.
[0070] S15. Send an alarm.
[0071] When it is determined that the human body is in a fall state, a fall alarm will be issued. Specifically, the alarm method can be set according to the needs of the ward, such as making an emergency call, contacting an emergency contact or sounding an alarm, so as to provide timely assistance to the guardian.
[0072] If the calculated standard deviations of the displacements of the multiple limb positions are all lower than the preset standard deviation in multiple consecutive video frames, it is determined that the positions of the multiple limb key points are convulsing, that is, it is determined that the human body is convulsing. When it is determined that the human body is in a state of convulsion, an alarm is issued, and the specific alarm is similar to that of the fall alarm.
[0073] In some other optional embodiments, the method for triggering an automatic alarm further includes:
[0074] Obtain the hand key point data in the human body position data from the human body position data, and compare the obtained hand key point data with the preset help gesture to determine whether it conforms to the preset help gesture, and the preset help gesture can be customized . When the ward wants to send out a distress signal, he can make a preset help gesture to the camera. When it is determined that the key point data of the hand conforms to the preset help gesture, an alarm is issued, such as making an emergency call or contacting an emergency contact.
[0075] In some other optional embodiments, the method for triggering an automatic alarm further includes:
[0076] Obtain a target video, the target video includes multiple video frames, use a face detection algorithm to obtain a face picture from the video frame, and perform facial expression recognition on the obtained face picture. Facial expression recognition includes the recognition of convulsive, frightened, and painful expressions on the ward's face. Train a facial expression classification model using the twitching, frightened, and distressed expression datasets. Read the video frame from the camera, obtain the face picture, and then send it into the face expression classification model to obtain the recognition result of the face. If a twitching, frightened or distressed expression can be recognized within a preset time, an alarm will sound.
[0077] To sum up, the human behavior monitoring method in the above-mentioned embodiments of the present invention obtains the video frame of the supervised person through the camera, and judges whether to fall or twitch by analyzing the human behavior characteristics of the video frame, thereby avoiding the discomfort and detection of traditional wearing sensors. inconvenient. Specifically, the ground position data and the human body position data are obtained by semantically segmenting the video frame, and then the center position data and limb data of the two spans in the human body position data are obtained. After setting the height, obtain the previous historical video frame in the adjacent time period again, obtain the historical height in the corresponding historical video frame, and further judge whether the height difference between the historical height and the current height is greater than the preset height difference, so as to determine the guardian. It is in a falling state; and whether the human body twitches is determined by calculating whether the standard deviation of the displacement of the limb data in consecutive multiple video frames is lower than the preset standard deviation, thus solving the inconvenient problem of wearing sensors to detect dangerous behaviors in the background art.

Example Embodiment

[0078] Embodiment 2
[0079] see figure 2 , shows the human behavior monitoring method in the second embodiment of the present invention, the behavior monitoring method in this embodiment is based on the method in the first embodiment, by detecting the cough voice in the video voice, and then adjusting the obtained video in the first embodiment The preset frequency to adjust the frequency of detection of human behavior features in the video. The method in this embodiment includes steps S21-S24.
[0080] S21. Acquire voice data in the target video, and detect cough voice data in the voice data.
[0081] The voice module set in the camera can be used to obtain the voice data of the guardian, obtain the target video through the camera, the target video includes the voice data of the guardian, and detect whether there is a cough in the voice data through an algorithm, such as through a hidden Markov model ( Hidden Markov Model, HMM) can quickly and accurately identify the cough voice data in the voice, and when the ward's cough voice data is recognized, the ward's cough voice data is extracted.
[0082] S22. Preprocess the cough speech data, and extract frequency information and intensity information in the cough speech data.
[0083] The extracted cough speech data is preprocessed to obtain effective cough speech data, wherein the preprocessing can maximize some information in the cough speech data to achieve the extraction of the best feature parameters.
[0084] Preprocessing includes pre-emphasis, windowing and framing, and endpoint detection. Pre-emphasis is to remove the influence of lip radiation on audio and increase the high-frequency resolution of speech. First-order FIR high-pass filter is commonly used to achieve pre-emphasis. The transfer function of pre-emphasis is:
[0085] H(z)=1-a*z -1
[0086] Among them, H is the output after filtering, z is the waveform data, a is the pre-emphasis coefficient, 0.9
[0087] Windowing and framing is to customize a pane size, and sliding the pane divides the long-term audio data into one frame by one frame of flake data.
[0088]Voice endpoint detection is to detect a valid voice segment from a continuous voice stream. It includes two aspects: detecting the starting point of the valid speech, that is, the front end point, and detecting the end point of the valid speech, that is, the back end point, and identifying the starting point and end point of the cough speech data through the endpoint detection to form a complete cough speech data.
[0089] Feature extraction includes extracting cough frequency and audio intensity in speech data.
[0090] The audio intensity of cough, defined here as d, represents the average value of audio peaks and troughs within a certain time t, and the formula is as follows:
[0091]
[0092] In the formula, x represents the value of the audio wave, max represents the maximum value, and min represents the minimum value.
[0093] In t time, use max(x), min(x) to find a median horizontal line whose slope is (max(x)+min(x))/2, assuming that the number of intersections between the audio wave and the horizontal line is m , then the frequency of cough is approximately defined as m/(3t).
[0094] S23. Determine whether the frequency information and the intensity information reach a preset threshold.
[0095] Determine whether the frequency information and intensity information of the cough voice data reach a preset threshold. It is understandable that the frequency information and intensity information can be compared with the preset frequency and preset intensity respectively. When any one of the frequency information and intensity information exceeds the threshold The preset frequency and preset intensity are regarded as reaching the preset threshold; the frequency information and intensity information can also be compared with the preset threshold at the same time, when both the frequency information and the intensity information exceed the preset frequency and preset intensity, then considered to have reached a preset threshold.
[0096] S24. If the frequency information and the intensity information reach the preset threshold, increase the first preset frequency to the second preset frequency, so as to acquire the target video according to the second preset frequency.
[0097] After judging each piece of valid cough voice data, the judgment results of all cough voice data are integrated through dynamic time warping, and repeated or incomplete detection results are removed, and finally the judgment results are output to determine whether the threshold is reached.
[0098] After reaching the threshold, the first preset frequency is increased to the second preset frequency, and the second preset frequency is higher than the first preset frequency, such as 30s-1min/time, so as to speed up the analysis and processing frequency of the video, through the monitoring personnel Whether the cough reaches the threshold, preset the possible dangerous situations of the ward, and adjust the frequency of video analysis and processing accordingly, which can improve the timeliness of monitoring and prevent people from falling or convulsing.
[0099] To sum up, the human behavior monitoring method in the above-mentioned embodiments of the present invention extracts the frequency and intensity information of the cough speech data by acquiring the cough speech data in the target video, and determines whether the frequency and intensity information reaches the preset threshold and assigns the first The preset frequency is increased to the second preset frequency, which can improve the timeliness of behavior analysis and processing of the guardian of the video clock, thereby improving the monitoring efficiency.

Example Embodiment

[0100] Embodiment 3
[0101] Another aspect of the present invention also provides a human behavior monitoring system, please refer to image 3 , the block diagram of the human behavior monitoring system in this embodiment is shown, including:
[0102] a semantic segmentation module, configured to acquire a target video at a first preset frequency, where the target video includes several video frames, and perform semantic segmentation on the video frames to obtain ground position data and human body position data;
[0103] a special data extraction module, used to obtain the feature data of the human body position data, the feature data includes the center position data of the two crotch of the human body and the limb data;
[0104] A twitching behavior judgment module, configured to obtain continuous multi-frame video frames of the target video according to the limb data, and determine whether the standard deviation of the displacement of the limb data in the continuous multi-frame video frame is greater than a preset standard deviation;
[0105] The falling behavior judgment module is used to obtain the current height data between the center position data of the two crotch of the human body and the ground position data, and judge whether the current height data is lower than the preset height, and if so, obtain the previous height data. historical video frames within a preset time period, and calculate the height difference between the historical height data corresponding to the historical video frame and the current height data, and determine whether the height difference is greater than the preset height difference;
[0106] an alarm module, configured to determine that the human body is in a state of convulsions if the standard deviation of the displacement of the limb data in the continuous multi-frame video frame is greater than the preset standard deviation; if the height difference is greater than the preset height difference, Then it is determined that the human body is in a falling state, and an alarm is issued.
[0107] Further, in some other optional embodiments, the semantic segmentation module includes:
[0108] A target video acquisition unit, used to acquire a target video, and the target video includes several video frames;
[0109] The semantic segmentation unit is used to mark the ground area and the human body area in the video frame as different colors, summarize the position data corresponding to the color of the ground area into the ground position data, and summarize the position data corresponding to the color of the human body area into the human body position data.
[0110] Further, in some other optional embodiments, the fall behavior judgment module module includes:
[0111] target position data determination unit for
[0112] According to the human body position data and the ground position data, determine the target position data within the preset distance associated with the human body position data in the ground position data, and determine the center position data of the two crotch of the human body and The current altitude data between the target ground position data.
[0113] Further, in some other optional embodiments, the twitch behavior judgment module includes:
[0114] A facial expression recognition unit, configured to obtain the facial data in the video frame, compare the facial data with a pre-trained facial expression model, and determine whether the facial data conforms to a preset facial expression , the preset facial expressions include convulsion, fright, and painful facial expressions;
[0115] If the facial data conforms to the preset facial expression, it is determined that the human body is in a state of convulsions, and an alarm is issued.
[0116] Further, in some other optional embodiments, the falling behavior judgment module includes:
[0117] The current altitude data acquisition unit is configured to, according to the human body position data and the ground position data, determine the target position data within a preset distance associated with the human body position data in the ground position data, and determine the The current height data between the center position data of the two crotch of the human body and the target ground position data.
[0118] Further, in some other optional embodiments, the twitch behavior judgment module includes:
[0119] A displacement standard deviation judging unit, configured to obtain the limb key point position data of the continuous multi-frame video frame, calculate the standard deviation of the limb key point displacement in the continuous multi-frame video frame according to the limb key point position data, and judge Whether the standard deviation of the displacement of the limb key points is less than the preset standard deviation.
[0120] Further, in some other optional embodiments, the apparatus further includes:
[0121] The help gesture judgment module is used to obtain the hand key point data in the human body position data according to the body position data, and determine whether the hand key point data conforms to the preset help gesture;
[0122] If so, it conforms to the characteristics of help-seeking behavior, and an alarm is issued.
[0123] Further, in some other optional embodiments, the apparatus further includes:
[0124] The preset frequency adjustment module is used to obtain the voice data in the target video, detect the cough voice data in the voice data, and extract the frequency information and intensity information in the cough voice data,
[0125] Determine whether the frequency information and intensity information reach a preset threshold;
[0126] If the frequency information and the intensity information reach the preset threshold, the first preset frequency is increased to the second preset frequency, so as to acquire the target video according to the second preset frequency.
[0127] Further, in some other optional embodiments, the preset frequency adjustment module includes:
[0128] The speech preprocessing unit is used to pre-emphasize the cough speech data to remove the influence of lip radiation in the cough speech data;
[0129] Windowing and framing processing is performed on the coughed speech data to decompose the coughed speech data into multi-frame sheet data;
[0130] Perform endpoint detection processing on the cough voice data to extract valid data in the cough voice data.
[0131] The functions or operation steps implemented by the foregoing modules and units when executed are substantially the same as those in the foregoing method embodiments, which will not be repeated here.
[0132] To sum up, the human behavior monitoring system in the above-mentioned embodiments of the present invention obtains the video frame of the supervised person through the camera, and judges whether to fall or twitch by analyzing the human behavior characteristics of the video frame, thereby avoiding the discomfort and detection of traditional wearing sensors. inconvenient. Specifically, the ground position data and the human body position data are obtained by semantically segmenting the video frame, and then the center position data and limb data of the two spans in the human body position data are obtained. After setting the height, obtain the previous historical video frame in the adjacent time period again, obtain the historical height in the corresponding historical video frame, and further judge whether the height difference between the historical height and the current height is greater than the preset height difference, so as to determine the guardian. It is in a falling state; and whether the human body twitches is determined by calculating whether the standard deviation of the displacement of the limb data in consecutive multiple video frames is lower than the preset standard deviation, thus solving the inconvenient problem of wearing sensors to detect dangerous behaviors in the background art.
[0133] The embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of the method for monitoring human behavior in the foregoing embodiment.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products