A secondary determination fighting identification method and device
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR TIANYUAN COMM INFORMATION SYST CO LTD
- Filing Date
- 2023-06-21
- Publication Date
- 2026-06-23
Smart Images

Figure CN116824447B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of Yolo video target detection and analysis, specifically providing a method and apparatus for secondary determination of fighting recognition. Background Technology
[0002] Fight detection is a computer vision technology applied in the field of time-series video surveillance. Its main goal is to automatically detect violent behaviors such as fighting, tearing, pushing, and weapon-based brawls in images, enabling timely alarms and interventions. Fight detection technology is already widely used in practical applications, especially in public places, large events, schools, and other locations requiring order and security. The following is a detailed analysis of the main relevant content, structure, principles, technical means and methods used in existing fight detection technologies, as well as their existing problems and shortcomings.
[0003] The main structure of fight detection technology can be divided into several parts: acquisition, preprocessing, feature extraction, classifier training and testing, real-time detection, and alarm. Its principle is to preprocess and extract features from the video stream to identify violent behavior, then compare these features with a pre-trained classifier to ultimately achieve automatic identification and alarm.
[0004] Since the current main method for fighting detection is based on temporal detection of video streams, it leads to a series of problems such as inaccurate feature extraction, lack of reliable training video datasets, high computational complexity and long processing time, and the high diversity of human postures and movements, resulting in error accumulation. Summary of the Invention
[0005] This invention addresses the shortcomings of the prior art by providing a highly practical secondary determination method for identifying fights.
[0006] A further technical objective of this invention is to provide a reasonably designed, safe, and applicable secondary determination and fight identification device.
[0007] The technical solution adopted by this invention to solve its technical problem is:
[0008] A two-step method for identifying fights includes the following steps:
[0009] S1. Dataset preparation;
[0010] S2, Network Model Construction;
[0011] S3, Network Model Training;
[0012] S4, Model Testing and Application.
[0013] Furthermore, in step S1, firstly, it is necessary to prepare an image or video dataset containing five scenarios: boxing, kicking, grappling, fighting on the ground, and fighting with weapons, as the training data for the first model.
[0014] Images for human behavior recognition are prepared as training data for the second model. The dataset contains various scenes, lighting conditions, viewpoints, and crowd density factors to train a more robust model.
[0015] Furthermore, step S2 includes a Backbone network module, a Neck network module, a Head network module, and a Prediction Head network module;
[0016] The Backbone network module is used to extract image features;
[0017] The Neck network module further extracts features to improve detection performance;
[0018] The Head network module is used for detector classification and regression;
[0019] The Prediction Head network module uses an FPN structure to connect feature maps of different scales, enabling the network to perform well on objects of different sizes.
[0020] Furthermore, the Backbone network module extracts image features through convolutional neural networks and residual networks, and is built using a pre-trained model of the feature extraction network;
[0021] The Head network module YOLOv7 uses an anchor-based method to generate the prediction box, predicting a total of 4 outputs: center point coordinates, length and width, and category.
[0022] YOLOv7 uses three different sizes of prior boxes to detect small, medium, and large targets, respectively.
[0023] Furthermore, in step S3, the fighting behavior model is trained using the training set, while the test set is used for validation and parameter tuning. Various techniques can be used to accelerate and optimize the training process.
[0024] Furthermore, after training, the model is evaluated and its robustness and stability are tested. During training, GPU or TPU acceleration hardware is used to improve the training speed and efficiency of the model.
[0025] Furthermore, in step S4, after training is completed, the trained first model is used to detect and recognize the input image;
[0026] If the model determines the behavior as fighting, the image will be cropped to the decision box output by the first model, and then the second model will be used to determine the human behavior. This is to improve the accuracy of fighting and avoid false alarms.
[0027] In a fight scene, the model monitors the video stream frame by frame in real time and identifies potential fights. Once a fight is detected, the system can automatically issue an alarm and notify relevant personnel for timely handling.
[0028] A secondary determination and fight identification device includes: at least one memory and at least one processor;
[0029] The at least one memory is used to store a machine-readable program;
[0030] The at least one processor is used to call the machine-readable program to execute a secondary determination method for identifying fights.
[0031] Compared with the prior art, the secondary determination method and apparatus for identifying fights of the present invention have the following outstanding advantages:
[0032] This invention utilizes Yolov7 as the basic framework for recognition and classification. It employs an image training set to avoid the drawbacks of scarce training video datasets and difficulties in annotation. Because it is a non-temporal detection method, it avoids the high computational complexity of the algorithm.
[0033] This invention provides a secondary assessment of fighting behavior, which improves the recognition of diverse human postures and movements. Attached Figure Description
[0034] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0035] Appendix Figure 1 This is a flowchart illustrating a two-stage method for identifying fights. Detailed Implementation
[0036] To enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below with reference to specific embodiments. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0037] The following is a preferred embodiment:
[0038] like Figure 1 As shown, this embodiment of a secondary determination method for identifying fights uses a neural network to train a neural network to identify objects in an image or video, thereby enabling the identification and alarm of fighting behavior.
[0039] The specific method is as follows:
[0040] S1. Dataset Preparation:
[0041] First, an image or video dataset containing five scenarios—boxing, kicking, grappling, ground fighting, and fighting with weapons—needs to be prepared as training data for the first model. Second, images of human behavior recognition, such as drinking, eating, walking, smoking, running, jumping, punching, kicking, and hugging, are prepared as training data for the second model.
[0042] These datasets should include various scenarios, lighting conditions, viewpoints, and crowd densities to train more robust models.
[0043] S2, Network Model Construction;
[0044] The backbone network module extracts image features using techniques such as convolutional neural networks (CNNs) and residual networks (ResNet). Typically, it's built using pre-trained models of feature extraction networks, such as ResNet50 or ResNet101.
[0045] Neck Network Module: YOLOv7 adds a neck network to YOLOv5, further extracting features based on the backbone to improve detection performance. Commonly used neck networks include FPN and PAN.
[0046] The Head network module is responsible for classification and regression for the detector. YOLOv7 uses an anchor-based method to generate predicted bounding boxes, predicting a total of four outputs: center coordinates, width and height, and class. YOLOv7 uses three sizes of prior boxes to detect small, medium, and large targets, respectively.
[0047] Prediction Head Network Module: This module is mainly designed for multi-scale problems. It uses an FPN structure to connect feature maps of different scales, enabling the network to perform well on objects of different sizes.
[0048] S3, Network Model Training:
[0049] The fighting behavior model is trained using a training set and validated and tuned using a test set. Various techniques can be used to accelerate and optimize training, such as learning rate adjustment, gradient accumulation, and weight decay.
[0050] After training, the model is evaluated, including the calculation of metrics such as precision, recall, and F1-score, as well as the testing of the model's robustness and stability.
[0051] During training, GPUs or TPUs can be used to accelerate the training process and improve the speed and efficiency of the model. At the same time, it is also necessary to prevent overfitting or underfitting and to employ appropriate strategies to address these issues.
[0052] S4, model testing and application;
[0053] After training, the trained first model can be used to detect and recognize input images. If the model determines the behavior to be fighting, the image will be cropped to fit the bounding box output by the first model, and then a second model will be used to determine the human behavior, thus improving the accuracy of fighting detection and avoiding false alarms. In fighting scenarios, the model can monitor video stream frames in real time and identify potential fighting behaviors. Once a fighting behavior is detected, the system can automatically issue an alarm and notify relevant personnel for timely handling.
[0054] Based on the above method, a secondary determination of a fight identification device in this embodiment includes: at least one memory and at least one processor;
[0055] The at least one memory is used to store a machine-readable program;
[0056] The at least one processor is used to call the machine-readable program to execute a secondary determination method for identifying fights.
[0057] The above-described specific embodiments are merely specific examples of the present invention. The patent protection scope of the present invention includes, but is not limited to, the above-described specific embodiments. Any appropriate changes or substitutions made by a person skilled in the art that conform to the claims of the present invention regarding a secondary determination of a fight identification method and device should fall within the patent protection scope of the present invention.
[0058] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A two-stage decision fighting recognition method, characterized by, Having the following steps: S1, data set preparation; First, an image or video data set containing five cases of punching, kicking, wrestling, fighting on the ground, and fighting with weapons needs to be prepared as training data for the first model; Prepare images of drinking, eating, walking, smoking, running, jumping, punching, kicking, and hugging as training data for the second model, the data set contains various different scenes, lighting conditions, perspectives, and crowd density factors, and train a more robust model; S2, network model building; Including Backbone network module, Neck network module, Head network module and Prediction Head network module; The Backbone network module is used to extract image features; The Neck network module further extracts features to improve detection performance; The Head network module for the classification and regression of the detector; The Prediction Head network module uses the FPN structure to connect feature maps of different scales, so that the network can perform well on objects of different sizes; The Backbone network module extracts image features through convolutional neural networks and residual networks, and uses a pre-trained model of a feature extraction network to build; The Head network module in YOLOv7 uses an anchor-based method to generate bounding boxes, a total of 4 outputs, including center coordinates, length and width, and category; Among them, YOLOv7 uses three sizes of prior boxes to detect small, medium and large targets; S3, network model training; The training set is used to train the fighting behavior model, and the test set is used for verification and parameter adjustment. Various techniques can be used to speed up and optimize training during the training process; After training, the model is evaluated and the robustness and stability of the model are tested. During training, GPU or TPU acceleration hardware is used to improve the training speed and efficiency of the model; S4, model testing and application; After training, the first trained model is used to detect and recognize the input image; If the model judges the behavior as fighting, the image will be cropped out of the first model's output bounding box and then judged by the second model to enhance the accuracy of fighting and avoid false positives; In the fighting scene, the model monitors the video stream in real time, extracts frames, and identifies possible fighting behaviors. Once a fighting behavior is detected, the system can automatically send an alarm and notify relevant personnel for timely handling.
2. A secondary determination fighting recognition device characterized by comprising: Comprise: At least one memory and at least one processor; The at least one memory is used to store machine readable programs; The at least one processor is used to call the machine readable programs to execute the method of claim 1.