An abnormal behavior recognition system and method based on human body skeleton key point prediction

By combining a skeletal key point prediction model with a memory module, the problem of high misjudgment rate in abnormal behavior recognition is solved, and efficient recognition and accurate judgment of various abnormal behaviors are achieved.

CN115273239BActive Publication Date: 2026-06-19GUODIAN NANJING AUTOMATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUODIAN NANJING AUTOMATION
Filing Date
2022-08-02
Publication Date
2026-06-19

Smart Images

  • Figure CN115273239B_ABST
    Figure CN115273239B_ABST
Patent Text Reader

Abstract

The application discloses a computer vision technical field and relates to an abnormal behavior recognition system and method based on human body skeleton key point prediction, which comprises the following steps: acquiring real human body skeleton key points; inputting the real human body skeleton key points into a human body skeleton key point prediction network model constructed in advance to obtain predicted human body skeleton key points; calculating a target skeleton key point loss value according to the predicted human body skeleton key points and the real human body skeleton key points; performing abnormal behavior judgment based on an abnormal score after the abnormal score is calculated according to the target skeleton key point loss value; and respectively visualizing and drawing and displaying the predicted human body skeleton key points and the real human body skeleton key points.The application predicts human body skeleton key points of a future frame by using a skeleton key point prediction model, identifies abnormalities by comparing the differences between the predicted skeleton key points and the real skeleton key points, considers the diversity of normal behaviors, and avoids misjudgment of normal behaviors in a reconstruction stage.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to an abnormal behavior recognition system and method based on the prediction of key points in the human skeleton, belonging to the field of computer vision technology. Background Technology

[0002] The goal of computer vision is to develop visual capabilities in computers and robots that are equivalent to those of humans. Intelligent surveillance technology is a crucial research direction within the field. With the rapid development of computer vision and deep learning, intelligent surveillance technology is being introduced into an increasing number of important scenarios, such as customs, airports, and public security. Intelligent surveillance technology utilizes image analysis algorithms to detect, identify, and track targets in videos. Users can define rules based on different application scenarios and monitoring objectives. Once a target in the monitored scenario violates the defined rules, it will be identified, recorded, and relevant measures will be taken.

[0003] Anomaly recognition is a typical task of intelligent surveillance technology. Anomaly behavior is defined as unexpected changes that show a high degree of discrepancy with other observations over a period of time. The purpose of anomaly recognition is to identify actions occurring in unusual locations or at unusual times within a scene, or unusual actions occurring in normal locations or at normal times within a scene. Intelligent surveillance equipment with anomaly recognition technology can replace traditional manual security methods, enabling unattended monitoring, which effectively saves manpower and improves the efficiency of handling abnormal events.

[0004] Because both abnormal and normal behaviors are heterogeneous, diverse, and irregular, abnormal behavior recognition is not a simple binary classification task. Most deep learning-based abnormal behavior recognition methods extract features of normal behavior only during the training phase, and then use the reconstruction error of these features to determine whether abnormal behavior exists in the current frame during the recognition phase. However, such algorithms, which rely on visual features, are susceptible to interference from redundant image information and noise. Furthermore, these methods do not consider the diversity of normal behavior, leading to misclassifications of normal behavior during the reconstruction phase. Summary of the Invention

[0005] This invention provides an abnormal behavior recognition system and method based on human skeletal key point prediction. It uses a skeletal key point prediction model to predict human skeletal key points in future frames, and identifies abnormalities by comparing the differences between predicted skeletal key points and actual skeletal key points. It takes into account the diversity of normal behavior and avoids misjudging normal behavior during the reconstruction stage.

[0006] To achieve the above objectives, the present invention is implemented using the following technical solution:

[0007] In a first aspect, the present invention provides a method for abnormal behavior recognition based on the prediction of key points in the human skeleton, comprising:

[0008] Obtain key points of a real human skeleton;

[0009] Input the real human skeleton key points into the pre-built human skeleton key point prediction network model to obtain the predicted human skeleton key points.

[0010] Calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints;

[0011] After calculating the anomaly score based on the loss value of the target skeleton key points, abnormal behavior is determined based on the anomaly score.

[0012] The key points of the predicted human skeleton and the key points of the actual human skeleton are visualized and displayed respectively.

[0013] Furthermore, the human skeletal key point prediction network model is constructed through the following steps:

[0014] Obtain key points of the real human skeleton during normal behavior;

[0015] Establish a network model for predicting key points of the human skeleton;

[0016] The parameters of the human skeletal keypoint prediction network model are determined based on the real human skeletal keypoints in normal behavior and the loss function.

[0017] Furthermore, a human skeletal keypoint prediction network model is established, including: the human skeletal keypoint prediction network model is equipped with a low-pass filter with a kernel size of 10 to eliminate noise differences in the input keypoints, as shown below:

[0018] g i,t-T+1 =αf i,t-T+1 +(1-α)f i,t-T (1)

[0019] Among them, g i,t-T+1 Here are the coordinates of the skeletal keypoints in frame t-T+1 after low-pass filtering, where α is the low-pass filter coefficient and f is the value of f. i,t-T+1 Let f be the coordinates of the skeletal keypoints in frames t-T+1, where i is the keypoint index, t represents the index of the current frame, T is the number of predicted frames, and f is the keypoint index. i,t-T The coordinates of the skeletal keypoints in frame tT;

[0020] The skeletal keypoint coordinates of the low-pass filtered frames t-T+1 are input into the GRU unit layer. Each layer has T GRU units. At each time step, the GRU unit receives information from the previous GRU. The output of each GRU unit is calculated in the following way:

[0021]

[0022] in, For the output of the GRU unit in frame t, For the input of the GRU unit in frame t, This is the output of the GRU unit for frame t-1;

[0023] The output of the GRU unit layer, after passing through the encoding module, yields a vector q of length T. i The vector is then remapped into K sub-vectors, where K = T * 2. Next, cosine similarity and softmax are used to calculate the cosine similarity between the vector and the information stored in the memory module, expressed by the formula:

[0024]

[0025] Among them, w k,m,i For vector q k,i sum vector Matching similarity weights, Let q be the m-th storage vector in the memory module. k,i The sub-vector input at the current time. Let m' be the index of each feature vector in the memory module;

[0026] The memory module in the human skeleton key point prediction network model is based on the vector q. k,i sum vector The matching similarity weights are assigned to the current vector:

[0027]

[0028] in, These are the weighted eigenvectors. The current vector corresponds to the matching similarity weights in the stored vectors; each subvector q k,i Each feature vector is combined with each memory module in the manner described above, and the weighted feature vector is then... With the feature vector q in the memory module i The horizontal stitching is performed and used as the input to the decoding module. The decoding module obtains the predicted skeletal keypoint coordinates in the feature space. The output of the decoding module is used as the input to the fully connected layer to map the features to 2D image coordinates.

[0029] The weight of each vector in the memory module is expressed by the formula:

[0030]

[0031] Among them, zk,m,i For vector q k,i The weights in the memory module, p m For each feature vector in the memory module, q k ′,i k′ is a subvector of the current skeletal keypoint segment, where k′ is the index of each subvector.

[0032] Normalize the weights:

[0033]

[0034] Among them, z ′k,m,i For the normalized weights, z k′,m,i The weights of each subvector in the memory module;

[0035] Where k′∈U m U m Given the index of each vector, the updated memory module is represented as:

[0036]

[0037] Furthermore, the parameters of the human skeleton keypoint prediction network model are determined based on the real human skeleton keypoints and loss function in normal behavior, including: given a set of normalized keypoints, centers, and sub-vectors output by the encoding module, the mean squared error loss function is:

[0038]

[0039] Among them, L p Let T be the mean squared error loss function, and T be the number of predicted frames. For predicting key points of the human skeleton, f t i For key points of the real human skeleton, c is the center point predicted for the current target. t q is the true center point of the current target. k,i For each feature subvector mapped from the skeletal keypoints, p p For the memory module to have the vector q k,i The feature vector with the largest weight is the first part of the loss function. The second part of the loss function represents the local displacement deviation of each predicted keypoint. Indicates the overall displacement deviation of the predicted key points, Part Three This represents the deviation of each feature subvector mapped from the skeletal keypoints from the feature vector that has the highest weight in the memory module.

[0040] Furthermore, the abnormal score is represented as follows:

[0041]

[0042] Where, α F For outlier scores, L p (f t ) represents the target skeleton keypoint loss value, and L represents the loss value of each segment. p (f t The result is obtained by formula (8), where T is the number of predicted frames.

[0043] Furthermore, the predicted key points of the human skeleton and the actual key points of the human skeleton are visualized and displayed separately, including:

[0044] Obtain the coordinates of predicted human skeleton key points and actual human skeleton key points;

[0045] Based on the coordinates of the predicted human skeleton key points and the actual human skeleton key points, draw the skeleton key points on a blank sheet of paper or the corresponding frame.

[0046] Connect the key points of the drawn skeleton to form a complete human skeletal framework.

[0047] Secondly, the present invention provides an abnormal behavior recognition system based on the prediction of key points in the human skeleton, comprising:

[0048] Key point acquisition module: used to acquire key points of the real human skeleton;

[0049] Key point prediction module: Used to input real human skeleton key points into a pre-built human skeleton key point prediction network model to obtain predicted human skeleton key points;

[0050] Keypoint loss value calculation module: used to calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints;

[0051] Abnormal Behavior Detection Module: This module is used to calculate anomaly scores based on the loss values ​​of key points in the target skeleton, and then to determine abnormal behaviors based on these scores.

[0052] Visualization and drawing module: used to visualize and draw key points of the predicted human skeleton and key points of the real human skeleton respectively.

[0053] Thirdly, the present invention provides an abnormal behavior recognition device based on the prediction of key points of human skeleton, including a processor and a storage medium;

[0054] The storage medium is used to store instructions;

[0055] The processor is configured to operate according to the instructions to perform the steps of the method according to any of the foregoing.

[0056] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the methods described above.

[0057] Compared with the prior art, the beneficial effects achieved by the present invention are as follows:

[0058] Compared with existing abnormal behavior recognition technologies, this invention employs a skeletal keypoint prediction model to predict human skeletal keypoints in future frames. Anomalies are identified by comparing the predicted and actual skeletal keypoints, taking into account the diversity of normal behaviors and avoiding misjudgments of normal behaviors during the reconstruction phase. Furthermore, the method of this invention incorporates a memory module and update strategy into the human skeletal keypoint prediction model to include multiple prototypes of normal behaviors, thus covering different normal behaviors and effectively reducing misjudgments. Secondly, the prediction model is trained using skeletal keypoints of normal behaviors, thus effectively identifying various abnormal behaviors such as fighting, falling, and throwing objects. Thirdly, the method of this invention only requires a pre-trained model during the application phase, and the training is unsupervised, resulting in high real-time performance. Finally, this invention defines a joint loss function to optimize and constrain the model parameters to further improve the accuracy of anomaly recognition. Attached Figure Description

[0059] Figure 1 This is a schematic diagram of the ED-Memory network framework provided in Embodiment 1 of the present invention;

[0060] Figure 2 This is a schematic diagram of the visualization prediction results provided in Embodiment 1 of the present invention. Detailed Implementation

[0061] The present invention will be further described below with reference to the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solution of the present invention, and should not be used to limit the scope of protection of the present invention.

[0062] Example 1:

[0063] An abnormal behavior recognition method based on human skeletal key point prediction includes:

[0064] I. Detection of Key Points in the Human Skeleton

[0065] Obtain the coordinates of human skeletal key points in consecutive frames of the monitoring video stream as the input of the network model. To ensure the real-time performance of the method, select the five frames before and after the key frames in the video stream, that is, select ten frames per second. First, use AlphaPose (human skeletal key point detection algorithm) to identify the human pose in each frame to obtain the positions of human skeletal key points. The number of key points extracted for each human target is 17. Then, in order to correspond the skeletal key points of each person in the ten frames one by one, use PoseFlow (pose tracking algorithm) to compare the similarity of each group of skeletal key points in adjacent frames. The skeletal key points with high similarity are identified as coming from the same target and are labeled with the corresponding person_id. For the target that appears in the latter frame but not in the previous frame, it is considered as a target newly entering the scene area, and its person_id is set to the current maximum value plus 1. The person_id of each target, the serial number of the corresponding frame, and the coordinates of 17 human skeletal key points are stored in a dictionary. The following gives an example of a dictionary:

[0066]

[0067] For the training set, store each dictionary from the training set in a json file for model training. For the test set, divide each dictionary of the ten frames into a group as the input of the network model to verify the effectiveness of the model.

[0068] II. Network Framework Design

[0069] To enable the network prediction model to have the ability to predict multiple normal behaviors, a memory module is introduced into the model in this invention. Name the model ED-Memory. The details of the network framework are as Figure 1 shown. The input of this architecture is the 17 key points of each person's skeleton and the center point c t of this target. The purpose of doing this is to train both the local displacement of each human skeletal key point and the central global displacement simultaneously. Next, the network framework is equipped with a low-pass filter with a kernel size of 10 to eliminate the noise difference of the input key points, expressed as:

[0070] g i,t-T+1 =αf i,t-T+1 +(1-α)f i,t-T (1)

[0071] where g i,t-T+1 is the coordinate of the skeletal key point of the t-T+1 frame after low-pass filtering, α is the low-pass filter coefficient, f i,t-T+1 is the coordinate of the skeletal key point of the t-T+1 frame, i is the key point index, 0 = <i <= 16, t represents the index of the current frame, T is the number of predicted frames, f i,t-TThe coordinates of the skeletal keypoints in frame tT.

[0072] In this embodiment, α is set to 0.8. After low-pass filtering, the data is fed into the GRU unit layer (gated cyclic unit). Each layer has T GRU units. At each time step, the GRU unit receives information from the previous GRU. The output of each GRU unit is calculated as follows:

[0073]

[0074] in, For the input of the GRU unit in frame t, For the output of the GRU unit in frame t, This is the output of the GRU unit for frame t-1.

[0075] After passing through the encoding module, a vector q of length T is obtained. i The vector is then remapped into K sub-vectors, where K = T * 2. Next, cosine similarity and softmax are used to calculate the cosine similarity between the vector and the information stored in the memory module, expressed by the formula:

[0076]

[0077] Among them, w k,m,i For vector q k,i sum vector Matching similarity weights, k = 1, ..., T*2, Let q be the m-th storage vector in the memory module, which records the prototype features of all normal behavioral data; k,i The sub-vector input at the current time. Let m be the index of each feature vector in the memory module, where m' = 1, ..., M.

[0078] Each memory module is assigned to the current vector according to its weight:

[0079]

[0080] in, These are the weighted eigenvectors. This represents the matching similarity weights in the stored vector corresponding to the current vector.

[0081] Therefore, each subvector q k,i Each feature is combined with each memory module in the manner described above, ensuring the model encompasses all prototypes of normal behavior and fully considers the diversity of normal behavior. The weighted feature vectors are then... With the feature vector q in the memory module iThe data is then horizontally stitched together and used as input to the decoding module. Next, the decoding module obtains the predicted skeletal keypoint coordinates in the feature space. Finally, the output of the decoding module is used as input to a fully connected layer to map the features to 2D image coordinates.

[0082] After each calculation of the skeletal keypoint coordinates for a length of T frames using the above steps, the memory module must be updated to record the behavioral features associated with the skeletal keypoint coordinates of that T frame into the feature vector of the memory module. The design concept of the memory module update strategy is derived from the model update method in Gaussian mixture models. The memory module is weighted with each sub-vector, and then the last memory module is updated using the vector with the largest weight. The weight of each vector in the memory module is expressed by the formula:

[0083]

[0084] Among them, z k,m,i For vector q k,i The weights in the memory module, p m For each feature vector in the memory module, q k ′,i k′ is a subvector of the current skeletal keypoint segment, where k′ is the index of each subvector.

[0085] Normalize the weights:

[0086]

[0087] Among them, z ′k,m,i For the normalized weights, z k′,m,i This represents the weight of each subvector in the memory module.

[0088] Where k′∈U m U m Let be the index of each vector. Then the updated memory module is represented as:

[0089]

[0090] To find the optimal values ​​for the weights and biases at each layer, this paper uses L2 distance to define and minimize the joint loss function. Given a set of normalized keypoints, centers, and subvectors output by the encoding module, the mean squared error loss function is:

[0091]

[0092] Among them, L p Let T be the mean squared error loss function, and T be the number of predicted frames. For the predicted human skeletal keypoints, i.e., the output of the fully connected layer, f ti For key points of the real human skeleton, c is the center point predicted for the current target. t q is the true center point of the current target. k,i For each feature subvector mapped from the skeletal keypoints, p p For the memory module to have the vector q k ,i The feature vector with the largest weight.

[0093] In this embodiment, T is 10, and the first part of the loss function The second part of the loss function represents the local displacement deviation of each predicted keypoint. Indicates the overall displacement deviation of the predicted key points, Part Three This represents the deviation of each feature subvector mapped from the skeletal keypoints from the feature vector that has the highest weight in the memory module.

[0094] III. Calculation of Abnormal Scores

[0095] To measure the outlier value of each target, this paper combines the target skeleton keypoint loss value and the number of frames in consecutive frames for calculation. That is, the magnitude of the outlier score is used to determine whether it is an abnormal behavior, which is expressed by the formula:

[0096]

[0097] Where, α F For outlier scores, L p (f t ) represents the target skeleton keypoint loss value, and L represents the loss value of each segment. p (f t The anomaly score is calculated using formula (8), where T is the number of predicted frames. The practical meaning of this anomaly score calculation formula is to take the average value of the loss of all skeletal trajectories in the video segment.

[0098] IV. Visualization of Prediction Results

[0099] After obtaining the predicted human skeletal keypoints, both the predicted and actual human skeletal keypoints are visualized, with keypoints showing excessively high abnormal scores highlighted to indicate abnormal behavior. First, a dictionary of each predicted and actual human skeletal keypoint is obtained. The coordinates of the keypoints are retrieved using the key value 'keypoints', and the keypoints are drawn using the OpenCV function `cv.circle(img, center, radius, color[, thickness[, lineType[, shift]]])`. Next, the drawn keypoints are connected using the OpenCV function `cv.line(img, pt1, pt2, color[, thickness[, lineType[, shift]]])` to form a complete human skeletal framework. If drawing the keypoints on white paper, the parameter `img` is set to a blank image with pixel values ​​all set to (255, 255, 255). If skeletal keypoints are drawn on the corresponding frame, the frame number is obtained using the key 'frame_number', the corresponding frame is found in the list of stored images, and set as the parameter 'img'. The visualization result is as follows: Figure 2 As shown, for normal behavior, the predicted human skeletal key points are basically consistent with the actual human skeletal key points. For abnormal behavior, the predicted human skeletal key points differ significantly from the actual human skeletal key points.

[0100] Example 2:

[0101] An abnormal behavior recognition system based on human skeletal key point prediction, which can implement the abnormal behavior recognition method based on human skeletal key point prediction described in Embodiment 1, includes:

[0102] Key point acquisition module: used to acquire key points of the real human skeleton;

[0103] Key point prediction module: Used to input real human skeleton key points into a pre-built human skeleton key point prediction network model to obtain predicted human skeleton key points;

[0104] Keypoint loss value calculation module: used to calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints;

[0105] Abnormal Behavior Detection Module: This module is used to calculate anomaly scores based on the loss values ​​of key points in the target skeleton, and then to determine abnormal behaviors based on these scores.

[0106] Visualization and drawing module: used to visualize and draw key points of the predicted human skeleton and key points of the real human skeleton respectively.

[0107] Example 3:

[0108] This invention also provides an abnormal behavior recognition device based on human skeletal key point prediction, which can realize the abnormal behavior recognition method based on human skeletal key point prediction described in Embodiment 1, including a processor and a storage medium.

[0109] The storage medium is used to store instructions;

[0110] The processor is configured to operate according to the instructions to perform the steps of the following method:

[0111] Obtain key points of a real human skeleton;

[0112] Input the real human skeleton key points into the pre-built human skeleton key point prediction network model to obtain the predicted human skeleton key points.

[0113] Calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints;

[0114] After calculating the anomaly score based on the loss value of the target skeleton key points, abnormal behavior is determined based on the anomaly score.

[0115] The key points of the predicted human skeleton and the key points of the actual human skeleton are visualized and displayed respectively.

[0116] Example 4:

[0117] This invention also provides a computer-readable storage medium that can implement the abnormal behavior recognition method based on human skeletal key point prediction as described in Embodiment 1. The medium stores a computer program that, when executed by a processor, performs the steps of the following method:

[0118] Obtain key points of a real human skeleton;

[0119] Input the real human skeleton key points into the pre-built human skeleton key point prediction network model to obtain the predicted human skeleton key points.

[0120] Calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints;

[0121] After calculating the anomaly score based on the loss value of the target skeleton key points, abnormal behavior is determined based on the anomaly score.

[0122] The key points of the predicted human skeleton and the key points of the actual human skeleton are visualized and displayed respectively.

[0123] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0124] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0125] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0126] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0127] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. An abnormal behavior recognition method based on human skeleton key point prediction, characterized in that, include: Obtain key points of a real human skeleton; Input the real human skeleton key points into the pre-built human skeleton key point prediction network model to obtain the predicted human skeleton key points. Calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints; After calculating the anomaly score based on the loss value of the target skeleton key points, abnormal behavior is determined based on the anomaly score. Visualize and display the predicted key points of the human skeleton and the key points of the real human skeleton, respectively. The human skeletal key point prediction network model is constructed through the following steps: Obtain key points of the real human skeleton during normal behavior; Establish a network model for predicting key points of the human skeleton; The parameters of the human skeleton keypoint prediction network model are determined based on the real human skeleton keypoints and loss function in normal behavior. Establishing a human skeleton keypoint prediction network model includes: the human skeleton keypoint prediction network model is equipped with a low-pass filter with a kernel size of 10 to eliminate noise differences in the input keypoints, as shown below: (1) in, These are the coordinates of the skeletal keypoints in frame t-T+1 after low-pass filtering. These are the low-pass filter coefficients. Let i be the coordinates of the skeletal keypoints in frames t-T+1, where i is the keypoint index, t represents the index of the current frame, and T is the number of frames to be predicted. The coordinates of the skeletal keypoints in frame tT; The skeletal keypoint coordinates of the low-pass filtered frames t-T+1 are input into the GRU unit layer. Each layer has T GRU units. At each time step, the GRU unit receives information from the previous GRU. The output of each GRU unit is calculated in the following way: (2) in, For the output of the GRU unit in frame t, For the input of the GRU unit in frame t, For the output of the GRU unit in frame t-1; The output of the GRU unit layer, after passing through the encoding module, yields a vector of length T. The vector is then remapped into K sub-vectors, where K = T * 2. Next, cosine similarity and softmax are used to calculate the cosine similarity between the vector and the information stored in the memory module, expressed by the formula: (3) in, For vectors sum vector Matching similarity weights, Let m be the m-th storage vector in the memory module. The sub-vector input at the current time. For each feature vector in the memory module, The index of each feature vector; The memory module in the human skeleton key point prediction network model is based on the vector. sum vector The matching similarity weights are assigned to the current vector: (4) in, These are the weighted eigenvectors. The current vector corresponds to the matching similarity weights in the stored vectors; each sub-vector Each feature vector is combined with each memory module in the manner described above, and the weighted feature vector is then... With the feature vector in the memory module The horizontal stitching is performed and used as the input to the decoding module. The decoding module obtains the predicted skeletal keypoint coordinates in the feature space. The output of the decoding module is used as the input to the fully connected layer to map the features to 2D image coordinates. The weight of each vector in the memory module is expressed by the formula: (5) in, For vectors Weights in the memory module For each feature vector in the memory module, This is a subvector of the current skeletal keypoint segment. The index of each subvector; Normalize the weights: (6) in, The normalized weights, The weights of each subvector in the memory module; in , Given the index of each vector, the updated memory module is represented as: (7) The parameters of the human skeletal keypoint prediction network model are determined based on real human skeletal keypoints and a loss function derived from normal behavior. This includes: given a set of normalized keypoints, centers, and sub-vectors output by the encoding module, the mean squared error loss function is: (8) in, Let T be the mean squared error loss function, and T be the number of predicted frames. For predicting key points of the human skeleton, For key points of the real human skeleton, The center point of the current target prediction. The true center point of the current goal. Each feature sub-vector is mapped to a skeletal keypoint. For the memory module to have vectors The feature vector with the largest weight is the first part of the loss function. The second part of the loss function represents the local displacement deviation of each predicted keypoint. Indicates the overall displacement deviation of the predicted key points, Part Three This represents the deviation of each feature subvector mapped from the skeletal keypoints from the feature vector that has the highest weight in the memory module. 2.The abnormal behavior recognition method based on human skeleton key point prediction according to claim 1, characterized in that, The anomaly score is represented as follows: (9) in, For abnormal scores, For the target skeletal keypoint loss value, each segment's T is the number of predicted frames, calculated by formula (8).

3. The abnormal behavior recognition method based on human skeletal key point prediction according to claim 1, characterized in that, Visualizations of predicted and actual human skeletal key points are presented, including: Obtain the coordinates of predicted human skeleton key points and actual human skeleton key points; Based on the coordinates of the predicted human skeleton key points and the actual human skeleton key points, draw the skeleton key points on a blank sheet of paper or the corresponding frame. Connect the key points of the drawn skeleton to form a complete human skeletal framework.

4. An abnormal behavior recognition system based on the prediction of key points in the human skeleton, characterized in that, include: Key point acquisition module: used to acquire key points of the real human skeleton; Key point prediction module: Used to input real human skeleton key points into a pre-built human skeleton key point prediction network model to obtain predicted human skeleton key points; Keypoint loss value calculation module: used to calculate the target skeleton keypoint loss value based on the predicted human skeleton keypoints and the actual human skeleton keypoints; Abnormal Behavior Detection Module: This module is used to calculate anomaly scores based on the loss values ​​of key points in the target skeleton, and then to determine abnormal behaviors based on these scores. Visualization and drawing module: used to visualize and draw key points of the predicted human skeleton and key points of the real human skeleton, respectively. The human skeletal key point prediction network model is constructed through the following steps: Obtain key points of the real human skeleton during normal behavior; Establish a network model for predicting key points of the human skeleton; The parameters of the human skeleton keypoint prediction network model are determined based on the real human skeleton keypoints and loss function in normal behavior. Establishing a human skeleton keypoint prediction network model includes: the human skeleton keypoint prediction network model is equipped with a low-pass filter with a kernel size of 10 to eliminate noise differences in the input keypoints, as shown below: (1) in, These are the coordinates of the skeletal keypoints in frame t-T+1 after low-pass filtering. These are the low-pass filter coefficients. Let i be the coordinates of the skeletal keypoints in frames t-T+1, where i is the keypoint index, t represents the index of the current frame, and T is the number of frames to be predicted. The coordinates of the skeletal keypoints in frame tT; The skeletal keypoint coordinates of the low-pass filtered frames t-T+1 are input into the GRU unit layer. Each layer has T GRU units. At each time step, the GRU unit receives information from the previous GRU. The output of each GRU unit is calculated in the following way: (2) in, For the output of the GRU unit in frame t, For the input of the GRU unit in frame t, For the output of the GRU unit in frame t-1; The output of the GRU unit layer, after passing through the encoding module, yields a vector of length T. The vector is then remapped into K sub-vectors, where K = T * 2. Next, cosine similarity and softmax are used to calculate the cosine similarity between the vector and the information stored in the memory module, expressed by the formula: (3) in, For vectors sum vector Matching similarity weights, Let m be the m-th storage vector in the memory module. The sub-vector input at the current time. For each feature vector in the memory module, The index of each feature vector; The memory module in the human skeleton key point prediction network model is based on the vector. sum vector The matching similarity weights are assigned to the current vector: (4) in, The weighted eigenvectors The current vector corresponds to the matching similarity weights in the stored vectors; each sub-vector All of these are combined with each memory module in the manner described above, and the weighted feature vectors are then... With the feature vector in the memory module The horizontal stitching is performed and used as the input to the decoding module. The decoding module obtains the predicted skeletal keypoint coordinates in the feature space. The output of the decoding module is used as the input to the fully connected layer to map the features to 2D image coordinates. The weight of each vector in the memory module is expressed by the formula: (5) in, For vectors Weights in the memory module For each feature vector in the memory module, This is a subvector of the current skeletal keypoint segment. The index of each subvector; Normalize the weights: (6) in, The normalized weights, The weights of each subvector in the memory module; in , Given the index of each vector, the updated memory module is represented as: (7) The parameters of the human skeletal keypoint prediction network model are determined based on real human skeletal keypoints and a loss function during normal behavior. This includes: given a set of normalized keypoints, centers, and sub-vectors output by the encoding module, the mean squared error loss function is: (8) in, Let T be the mean squared error loss function, and T be the number of predicted frames. For predicting key points of the human skeleton, For key points of the real human skeleton, The center point of the current target prediction. The true center point of the current goal. Each feature sub-vector is mapped to a skeletal keypoint. For the memory module to have vectors The feature vector with the largest weight is the first part of the loss function. The second part of the loss function represents the local displacement deviation of each predicted keypoint. Indicates the overall displacement deviation of the predicted key points, Part Three This represents the deviation of each feature subvector mapped from the skeletal keypoints from the feature vector that has the highest weight in the memory module.

5. An abnormal behavior recognition device based on human skeleton key point prediction, characterized by, Including processor and storage media; The storage medium is used to store instructions; The processor is configured to operate according to the instructions to perform the steps of the method according to any one of claims 1 to 3.

6. A computer-readable storage medium having stored thereon a computer program, characterized in that When executed by a processor, the program implements the steps of the method according to any one of claims 1 to 3.