High-altitude projectile detection method and device based on multi-source signal recognition, and equipment
By using a multi-source signal recognition method, combined with the Vibe background model, convolutional neural network and RANSAC algorithm, the problems of high false detection rate and manual dependence in high-altitude object throwing detection are solved, and efficient and intelligent high-altitude object throwing recognition is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN INFINOVA
- Filing Date
- 2022-11-11
- Publication Date
- 2026-06-12
Smart Images

Figure CN115760925B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of video security monitoring, and in particular to a method, apparatus, equipment and storage medium for detecting objects thrown from heights based on multi-source signal recognition. Background Technology
[0002] Currently, incidents of objects being thrown from high-rise buildings have long been a difficult problem to manage. This behavior poses a significant safety hazard and can cause serious injury. Furthermore, the rapid and sudden nature of such incidents makes it difficult for pedestrians to avoid them and hinders subsequent investigation. With increasing public awareness of safety and societal development, various methods for dealing with objects thrown from high-rise buildings are being implemented. Among these, detection methods for thrown objects are emerging in security products, greatly assisting in real-time monitoring and post-incident investigation.
[0003] However, the current control of objects thrown from high-rise buildings is mainly based on manual management, through regular patrols and observations, or post-incident investigations by visiting residents and reviewing surveillance footage. These manual methods are inefficient and consume a lot of manpower. Current automatic detection equipment for objects thrown from high-rise buildings mainly uses methods such as optical flow, frame difference, and background detection. However, current solutions based on these methods still suffer from high false detection rates and the potential for detecting small objects, such as birds, leaves, or camera vibrations, which can trigger false detections.
[0004] Therefore, a method and device for detecting objects thrown from high altitudes based on multi-source signal recognition is needed to achieve efficient and intelligent identification of such objects. Summary of the Invention
[0005] Therefore, the purpose of this invention is to at least partially address the shortcomings of the prior art, thereby proposing a method, apparatus, device, and storage medium for detecting objects thrown from high altitudes based on multi-source signal recognition.
[0006] To achieve the above objectives, the present invention adopts the following technical solution:
[0007] In a first aspect, the present invention provides a method for detecting objects thrown from high altitudes based on multi-source signal recognition, the method comprising:
[0008] The video and audio of the high-altitude projectile area are acquired. A Vibe background model is created based on the first frame, and the audio is detected as a projectile sound using a preset convolutional neural network model. The video frame includes the first frame and the current frame.
[0009] The foreground binary image of the current frame is obtained in real time through the Vibe background model, and the Vibe background model is updated according to the foreground binary image of the current frame.
[0010] Extract the foreground binary image of the current frame and the foreground binary images obtained before the current frame and create a set of binary images. Subtract all the foreground binary images obtained before the current frame from the foreground binary image of the current frame to obtain the suspected parabolic trajectory of the motion point after removing interference factors.
[0011] The suspected parabolic trajectory is evaluated using the RANSAC algorithm to determine whether it conforms to the law of parabolic motion. If the suspected parabolic trajectory conforms to the law of parabolic motion, and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is triggered.
[0012] Secondly, the present invention provides a high-altitude object detection device based on multi-source signal recognition, the device comprising:
[0013] Acquisition module: used to acquire video and audio of the high-altitude projectile area, create a Vibe background model based on the first frame, and detect whether the audio is a projectile sound through a preset convolutional neural network model, wherein the video includes the first frame and the current frame;
[0014] Update module: used to obtain the foreground binary image of the current frame in real time through the Vibe background model, and update the Vibe background model according to the foreground binary image of the current frame;
[0015] Suspected Module: Used to extract the foreground binary image of the current frame and the foreground binary images obtained before the current frame and create a set of binary images. By subtracting all the foreground binary images obtained before the current frame from the foreground binary image of the current frame, the suspected parabolic trajectory of the motion point with interference removed is obtained.
[0016] Early warning module: used to determine whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm. If the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued.
[0017] Thirdly, the present invention also provides a high-altitude object throwing detection device based on multi-source signal recognition, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps of the high-altitude object throwing detection method based on multi-source signal recognition as described in the first aspect.
[0018] Fourthly, the present invention also provides a storage medium having a computer program stored thereon, which, when executed, implements the various steps of the high-altitude object detection method based on multi-source signal recognition as described in the first aspect.
[0019] This invention provides a method, apparatus, device, and storage medium for detecting objects thrown from high altitudes based on multi-source signal recognition. The method includes: acquiring video and audio from a high-altitude object throwing area; creating a Vibe background model based on a first frame and detecting whether the audio is a parabolic sound using a preset convolutional neural network model; wherein the video frame includes the first frame and the current frame; acquiring the foreground binary map of the current frame in real time using the Vibe background model; updating the Vibe background model based on the foreground binary map of the current frame; extracting the foreground binary map of the current frame and preset foreground binary maps acquired before the current frame and creating a set of binary maps; subtracting all foreground binary maps acquired before the current frame from the foreground binary map of the current frame to obtain a suspected parabolic trajectory of a moving point after removing interference factors; determining whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm; if the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then issuing a parabolic alarm. The method provided by this invention uses the Vibe background model as the foreground detection algorithm, which has a good suppression effect on foreground noise caused by camera shake, etc. Furthermore, by detecting the visual parabolic trajectory and recognizing the audio, it can better filter out interference factors that are very similar to the parabolic trajectory, thus ensuring that false detections are reduced when the parabola is detected. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.
[0021] Figure 1 This is a flowchart illustrating the high-altitude projectile detection method based on multi-source signal recognition of the present invention.
[0022] Figure 2 This is a collection of foreground images for the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0023] Figure 3 This is a differential schematic diagram of the high-altitude projectile detection method based on multi-source signal recognition according to the present invention;
[0024] Figure 4 This is a schematic diagram of a sub-process of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0025] Figure 5 This is a schematic diagram of another sub-process of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0026] Figure 6 This is a schematic diagram of another sub-process of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0027] Figure 7 This is a schematic diagram of another sub-process of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0028] Figure 8 This is a schematic diagram of the preset convolutional neural network model of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0029] Figure 9 This is a schematic diagram of the overall process of the high-altitude projectile detection method based on multi-source signal recognition of the present invention;
[0030] Figure 10 This is a schematic diagram of the program modules of the high-altitude object detection device based on multi-source signal recognition of the present invention. Detailed Implementation
[0031] To make the objectives, features, and advantages of this invention more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0032] Please refer to Figure 1 , Figure 1 This is a flowchart illustrating the high-altitude object detection method based on multi-source signal recognition according to an embodiment of this application. In this embodiment, the high-altitude object detection method based on multi-source signal recognition includes:
[0033] Step 101: Obtain video and audio of the high-altitude parabolic area, create a Vibe background model based on the first frame, and detect whether the audio is a parabolic sound using a preset convolutional neural network model. The video frame includes the first frame and the current frame.
[0034] In this embodiment, video and audio of the high-altitude object throwing area are acquired. The video mainly includes whether there are moving objects being thrown from the high altitude, and the audio mainly consists of sound clips collected on-site from the high-altitude object throwing area.
[0035] Specifically, the acquired video includes multiple video frames. The first frame and the current video frame are extracted, and the Vibe (Visual Background Extractor) background model is initialized using the first frame. The Vibe background model is a pixel-level video background modeling algorithm.
[0036] Specifically, while initializing the Vibe background model using the first frame of the video of an object being thrown from a height, a pre-defined convolutional neural network model detects whether the audio information from the scene of the object being thrown is indeed the sound of an object being thrown. This pre-defined convolutional neural network model is an audio classification model primarily used for identifying the sound of objects being thrown.
[0037] Step 102: Obtain the foreground binary image of the current frame in real time through the Vibe background model, and update the Vibe background model according to the foreground binary image of the current frame.
[0038] In this embodiment, after the Vibe background model is created in step 101, the current frame is extracted from the video frames of the acquired high-altitude projectile video. The foreground binary image of the moving region of the current frame is then obtained from the Vibe background model created in step 101. The moving foreground region has a value of 255 pixels, while the remaining background region has a value of 0 pixels. After obtaining the foreground binary image of the current frame through the Vibe background model, the Vibe background model is updated based on the obtained foreground binary image of the current frame.
[0039] Step 103: Extract the foreground binary image of the current frame and the foreground binary images obtained before the current frame and create a set of binary images. Subtract all the foreground binary images obtained before the current frame from the foreground binary image of the current frame to obtain the suspected parabolic trajectory of the motion point after removing interference factors.
[0040] In this embodiment, the foreground binary image of the current frame obtained through the Vibe background model, and preset foreground binary images of previous video frames obtained through the Vibe background model are extracted. The foreground binary image of the current frame and the foreground binary images of multiple video frames before the current frame are then created as a set of binary images. Specifically, the set of binary images includes the foreground binary image of the current frame and the foreground binary images of video frames before the current frame. For example, please refer to... Figure 2 , Figure 2 This is a set of binary images for a high-altitude projectile detection method based on multi-source signal recognition. The set contains 7 foreground binary images. The real-time foreground binary image corresponding to the Kth frame is K, the foreground binary image corresponding to the previous frame is K-1, and so on, up to K-6. In this embodiment, the Kth frame is equivalent to the current frame in this embodiment.
[0041] In this embodiment, please refer to Figure 3 , Figure 3 A differential diagram illustrating a high-altitude projectile detection method based on multi-source signal recognition is presented. Because in a normal projectile process, the moving projectile pixel region continuously descends and does not remain within a single area. Therefore, by subtracting the foreground binary image of the current frame from the foreground binary image of all previous video frames in the binary image set, multiple noise-reduced differential images are obtained. Based on these noise-reduced differential images, the suspected trajectory of the moving point, free from interference, can be obtained. Figure 3 This is a schematic diagram of subtracting the foreground binary image of any video frame preceding the current frame from the foreground binary image of the current frame in the set of binary images in this embodiment. The foreground binary image of the current frame in the set of binary images includes suspected moving objects and interfering factors. The foreground binary image of any video frame preceding the current frame in the set of binary images includes interfering factors. By subtracting the foreground binary image of any video frame preceding the current frame from the foreground binary image of the current frame in the set, a difference image with suspected moving points removed can be obtained.
[0042] Step 104: Use the RANSAC algorithm to determine whether the suspected parabolic trajectory conforms to the law of parabolic motion. If the suspected parabolic trajectory conforms to the law of parabolic motion, and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued.
[0043] In this embodiment, for the suspected parabolic trajectory obtained in step 103, the RANSAC (RANdomSample Consensus) algorithm is used to determine whether the parabolic trajectory conforms to the law of parabolic motion. Only when the RANSAC algorithm determines that the parabolic trajectory conforms to the law of parabolic motion, and when the preset convolutional neural network model detects that the audio collected from the scene is a parabolic sound, can a parabolic event be determined, thereby triggering a parabolic alarm. The RANSAC algorithm is a random sampling consensus algorithm. Since the curve that best matches a typical parabolic trajectory is the parabola in a quadratic curve, a quadratic equation fitting algorithm is used for the parabolic trajectory fitting algorithm in this embodiment.
[0044] This invention provides a method for detecting objects thrown from high altitudes based on multi-source signal recognition. The method includes: acquiring video and audio of the high-altitude object throwing area; creating a Vibe background model based on a first frame and detecting whether the audio is a parabolic sound using a preset convolutional neural network model; wherein the video frame includes the first frame and the current frame; acquiring the foreground binary map of the current frame in real time using the Vibe background model; updating the Vibe background model based on the foreground binary map of the current frame; extracting the foreground binary map of the current frame and preset foreground binary maps acquired before the current frame and creating a set of binary maps; subtracting all foreground binary maps acquired before the current frame from the foreground binary map of the current frame to obtain a suspected parabolic trajectory of the moving point after removing interference factors; determining whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm; if the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then issuing a parabolic alarm. The method provided by this invention uses the Vibe background model as the foreground detection algorithm, which has a good suppression effect on foreground noise caused by camera shake, etc. Furthermore, by detecting the visual parabolic trajectory and recognizing the audio, it can better filter out interference factors that are very similar to the parabolic trajectory, thus ensuring that false detections are reduced when the parabola is detected.
[0045] Further, please refer to Figure 4 , Figure 4 This is a schematic diagram of a sub-process of the high-altitude object detection method based on multi-source signal recognition in this application embodiment. In this embodiment, the process of subtracting all foreground binary images obtained before the current frame from the foreground binary image of the current frame specifically includes:
[0046] Step 201: Subtract all the foreground binary maps obtained before the current frame from the foreground binary map set to obtain multiple noise-reduced difference maps;
[0047] Step 202: If the difference map after noise reduction has a foreground point, then the foreground point is regarded as a suspected moving point.
[0048] In this embodiment, the foreground binary map of the current frame is subtracted from the foreground binary maps of all video frames preceding the current frame. The foreground binary maps of the video frames preceding the current frame are also foreground binary maps obtained through the Vibe background model. The foreground binary map of the current frame and the foreground binary maps of the video frames preceding the current frame are then combined to create a set of binary maps. The Vibe background model generates foreground binary maps for all video frames in the acquired high-altitude projectile video where a moving object appears. In this step, multiple foreground binary maps of the current frame and the video frames preceding the current frame are created into a set of binary maps. The foreground binary maps of all video frames preceding the current frame in the set are subtracted from the foreground binary map of the current frame. For example, if the current frame is frame K, the video frames preceding the current frame in the set of binary maps are frames K-1, K-2, K-3, K-4, etc. The foreground binary maps of frames K-1, K-2, K-3, K-4, etc., are subtracted from the foreground binary map of frame K, and so on, until the foreground binary maps of all video frames preceding frame K in the set are subtracted. This results in multiple noise-reduced difference maps. If the obtained difference map contains foreground points, then the foreground points in the difference map are considered as suspected moving points.
[0049] In this embodiment, the foreground binary images obtained from all subtractions are first subjected to erosion operations in image morphology processing, followed by dilation operations. This step can effectively remove motion interference factors in most cases, such as leaves or swaying clothing, while preserving the trajectory points of parabolic events. Specifically, the erosion and dilation operations in image morphology processing can filter out minor noise and retain a sufficiently large foreground region.
[0050] Further, please refer to Figure 5 , Figure 5 This is another sub-process diagram of the high-altitude projectile detection method based on multi-source signal recognition in this application embodiment. In this embodiment, obtaining the suspected projectile trajectory of the moving point after removing interference factors specifically includes:
[0051] Step 301: Extract the connected components of each noise-reduced difference graph, obtain the coordinate values of the centroid of each connected component, and save them in a coordinate list;
[0052] Step 302: The set of coordinate points stored in the coordinate list is regarded as the suspected parabolic trajectory, and the set of coordinate points includes multiple coordinate values.
[0053] In this embodiment, after obtaining multiple noise-reduced difference graphs in step 201, there are still suspected moving points on the multiple difference graphs. Connected components on the multiple difference graphs are extracted, and the connected components on the multiple difference graphs are the suspected moving points on the difference graphs. After extracting the connected components on the difference graphs, the coordinate values of the centroid of each connected component are obtained, for example, [[(x11,y11),(x12,y12)..][(x21,y21)...]..[]..], which are the coordinate values of the suspected moving points. The coordinate values of the centroids of the multiple connected components are then stored in a coordinate list.
[0054] In this embodiment, if more than half of the obtained noise-reduced difference graphs contain foreground regions, i.e., there are suspected moving points, then the set of coordinate points of the centroids of the connected domains on the difference graphs (i.e., the coordinates of the suspected moving points) is regarded as a suspected parabolic trajectory, wherein the set of coordinate points includes the coordinates of the centers of multiple connected domains.
[0055] Further, please refer to Figure 6 , Figure 6 This is another sub-process diagram of the high-altitude object detection method based on multi-source signal recognition in this application embodiment. In this embodiment, the suspected parabolic trajectory is judged by the RANSAC algorithm to determine whether it conforms to the law of parabolic motion, specifically including:
[0056] Step 401: Integrate the coordinate values into the same coordinate point set, and use the RANSAC algorithm to fit the coordinate point set to a quadratic curve;
[0057] Step 402: If the fitted quadratic curve conforms to the law of parabola, then it is determined that a parabolic trajectory exists.
[0058] In this embodiment, the coordinate values of suspected moving points on multiple noise-reduced difference maps are integrated into the same coordinate point set. The RANSAC algorithm is used to fit a quadratic curve to the coordinate point set. If a quadratic curve that conforms to the law of parabola can be fitted, it is determined that a parabolic trajectory exists. At the same time, noise outside the parabola can be removed during the fitting process.
[0059] The fitting process is roughly as follows:
[0060] 1. Randomly select several coordinate points and calculate the quadratic fitting curve equation model M;
[0061] 2. Calculate the projection error of the coordinate points of the point set in model M. If the error is less than the threshold, add it to the interior point set F.
[0062] 3. If the number of coordinate points in the inlier set in 2 is greater than the current optimal inlier set F_best, then update F_best and accumulate the iteration count i at the same time.
[0063] 4. If the iteration count i is greater than the threshold, then stop the iteration. The fitting curve corresponding to F_best is the fitted parabolic curve. Otherwise, continue the iteration.
[0064] 5. If the number of elements in the inlier set F_best is greater than the threshold K1 (in principle, 0 < K1 < K), then define it as the final parabolic trajectory, define the corresponding inliers as trajectory points, and discard the outliers (define them as noise); otherwise, discard the parabolic curve and all coordinate points.
[0065] Furthermore, integrating the coordinate values into the same coordinate point set and using the RANSAC algorithm to fit a conic curve to the coordinate point set further includes:
[0066] If the fitted conic curve does not conform to the parabolic law, then obtain the foreground binary image of the motion region of the current frame again through the vibe background model, and update the vibe background model again according to the foreground binary image of the current frame until the fitted conic curve conforms to the parabolic law.
[0067] In this embodiment, if the conic curve fitted in step 401 does not conform to the parabolic law, then obtain the foreground binary image of the motion region of the current frame again through the vibe background model, and update the vibe background model again according to the foreground binary image of the current frame until the conic curve fitted by the RANSAC algorithm conforms to the parabolic law.
[0068] Furthermore, please refer to Figure 7 , Figure 7 which is another sub - process schematic diagram of the high - altitude parabolic detection method based on multi - source signal recognition in the embodiment of the present application. In this embodiment, the audio is detected whether it is the sound of parabolic through a preset convolutional neural network model, specifically including:
[0069] Step 501: Analyze the audio in real - time based on the preset convolutional neural network model.
[0070] Step 502: If it is determined that the audio is the sound of parabolic landing, then send a parabolic event signal to the system.
[0071] In this embodiment, please refer to Figure 8 , Figure 8This diagram illustrates a pre-defined convolutional neural network (CNN) model for a high-altitude object detection method based on multi-source signal recognition. This model is primarily used for identifying the sound of projectiles. It extracts a one-dimensional audio sequence from the current time point back T to the current time point, which serves as the input to the CNN. The CNN model mainly consists of sub-modules including convolutional layers (conv), batch normalization (BN) layers, activation functions (ReLU in this example), pooling layers (MaxPooling in this example), and fully connected layers (FC). Since the audio input is one-dimensional, the convolutional layers use nx1 convolutions to extract audio features. In the conv layer, taking a 3x1 kernel as an example, with m kernels and a batch size of b, the parameters of the conv layer are 3x1xmxb. The conv layer, BN layer, and ReLU layer form a small module. Several modules are then compressed using MaxPooling. The resulting compressed module is fed into the FC fully connected layer, and after several FC layers, two outputs are obtained. The two outputs represent the probability of whether the sound is a parabolic sound.
[0072] In this embodiment, multiple sound segments of objects landing are collected, with a variety of object types. The sound acquisition device is positioned at a suitable distance from the landing point to ensure the sound is audible. A sufficient number of sound samples are collected and used as a dataset to train a pre-defined convolutional neural network (CNN) model. This model is then able to distinguish the sound of a falling object from other sounds. The trained CNN model analyzes the collected sound segments in real time. If a sound is identified as a falling object, a falling object event signal is sent to the system. The use of a CNN capable of accepting one-dimensional input signals allows for adjustment and extraction of audio signals. After training, it can perform real-time sound identification, quickly determining whether a falling object is the sound of an object.
[0073] Furthermore, if the suspected parabolic trajectory conforms to the law of parabolic motion, and the preset convolutional neural network model detects that the audio is a parabolic sound, a parabolic alarm is issued, specifically including:
[0074] If the results of obtaining the suspected parabolic trajectory conforming to the law of parabola and obtaining the audio as a parabolic sound are obtained within a preset time period, a parabolic alarm is issued.
[0075] In this embodiment, when a suspected parabolic trajectory is determined to conform to the law of parabolic motion, and the preset convolutional neural network model detects that the audio collected from the scene of the high-altitude object throwing is the sound of a parabolic object, it is then determined whether the occurrence time of the suspected parabolic trajectory and the sound of the parabolic object collected from the scene were obtained within a similar time frame. If they are obtained within a similar time frame, a parabolic event alarm is triggered. Within this similar time frame, the existence of a parabolic trajectory is determined by visual detection, and simultaneously, an audio signal indicating a parabolic event is received. This is considered a confirmed parabolic event, and an alarm is sent to the outside of the system. This can filter out false alarms of events very similar to the parabolic trajectory. Simultaneously, based on visual and audio detection, interference factors such as downward-flying birds, insects, and falling leaves can be effectively filtered out. The combination of visual and audio signals from multiple sources can better filter out interference factors very similar to the parabolic trajectory, thereby achieving a lower false alarm rate. The preset similar time frame can be set to 1-10 seconds.
[0076] Furthermore, the object throwing alarm includes notifying and reminding management personnel of an object throwing incident and preserving evidence of the throwing point. Optionally, the alarm can output the motion point on a monitoring screen so that staff can confirm the location where the object was thrown; or it can optionally notify management personnel of an object throwing incident via SMS or other means.
[0077] Further, please refer to Figure 9 , Figure 9 This is a schematic diagram of the overall process of the high-altitude object detection method based on multi-source signal recognition in this application embodiment. The overall steps of this application embodiment are as follows:
[0078] S1: Extract the first frame of the high-altitude object throwing surveillance video and initialize the Vibe background model.
[0079] S2: In subsequent high-altitude object throwing monitoring video frames, the background model created in S1 is used to obtain the binary image of the moving foreground region of the current frame using the vibe method (the moving foreground region has a value of 255 pixels, and the remaining background region has a value of 0 pixels). The background model is updated based on the detection results of the binary foreground image of the current frame.
[0080] S3: Create a foreground image set. Collect the current foreground image obtained in S2 and the foreground images obtained before the preset M frames, and merge them into an image set. Due to the normal parabolic process, the moving parabolic pixel area continuously moves downwards and will not appear in one area for a long time. Therefore, for each foreground image in the image set, subtract all foreground images in the set before the current frame. Then, perform erosion operation and dilation operation on all the subtracted foreground images. This step can effectively remove motion interference factors such as leaves and swaying clothes in most cases, while preserving the trajectory points of the parabolic event.
[0081] S4: In step S3, the difference image has been denoised. If there are foreground points, they are regarded as suspected moving points. Extract the connected regions of each denoised difference image, and the coordinate values of the centroids of each connected region are saved in a coordinate list: [[(x11, y11), (x12, y12)..][(x21, y21)...]..[]..]. If there are foreground regions in more than half of the denoised difference images in the denoised difference image set, then these coordinate point sets are regarded as suspected parabolic trajectories.
[0082] S5. For the suspected parabolic trajectories, integrate the coordinate points into the same coordinate point set, and use the RANSAC algorithm to fit a quadratic curve to the coordinate point set. If a quadratic curve that conforms to the parabolic law can be separated out, it is determined that there is a parabolic trajectory. At the same time, the noise outside the parabola can also be removed during the fitting process. The fitting process is roughly as follows:
[0083] 1. Randomly select several coordinate points and calculate the quadratic fitting curve equation model M;
[0084] 2. Calculate the projection error of all coordinate points in the point set in the model M. If the error is less than the threshold, add it to the inlier set F;
[0085] 3. If the number of coordinate points in the inlier set in step 2 is greater than the current optimal inlier set F_best, update F_best and accumulate the iteration count i;
[0086] 4. If the iteration count i is greater than the threshold, stop the iteration. The fitting curve corresponding to F_best is the fitted parabolic curve. Otherwise, continue the iteration.
[0087] 5. If the number of elements in the inlier set F_best is greater than the threshold K1 (in principle, 0 < K1 < K), it is defined as the final parabolic trajectory, the corresponding inliers are defined as trajectory points, and the outliers are discarded (defined as noise); otherwise, discard the parabolic curve and all coordinate points.
[0088] S6: Before collecting video and audio of objects falling from high altitudes, first collect multiple sound recordings of objects landing. The types of objects should be as diverse as possible, and the sound acquisition device should be positioned at a suitable distance from the landing point to hear the sound. Collect a sufficient number of sound samples for analysis, using this one-dimensional audio data as a dataset. Train a convolutional neural network (CNN) model using this dataset, enabling the model to distinguish the sound of objects landing from other sounds. Using the pre-defined CNN model, analyze the collected sound segments in real time. If the sound is identified as an object landing, a projectile event signal is sent to the system. While the CNN model is analyzing the collected audio in real time, the system simultaneously initializes the Vibe background model using the first frame of the high-altitude object-falling monitoring video.
[0089] S7: If, within a short timeframe, a parabolic trajectory is detected visually, and an audio signal indicating a parabolic event is received, this is considered a confirmed parabolic event, and an alarm is sent to the outside of the system. Simultaneously, based on visual and audio detection, interference from downward-flying birds, insects, falling leaves, and other disturbances can be effectively filtered out.
[0090] In summary, the system simultaneously acquires video and audio from the high-altitude object throwing area and extracts each frame of the video and audio. The first frame of the video is used to initialize the Vibe background model, and the Vibe background model is used to obtain the foreground binary map of each frame of the video with moving objects in real time. Meanwhile, the system uses a pre-trained convolutional neural network model to detect whether there is a parabolic sound in each frame of the audio. If the trajectory of the parabola in the video conforms to the law of parabolic motion, and the audio is detected by the pre-trained convolutional neural network model to be a parabolic sound, and the events occur simultaneously within a similar time frame, then a parabolic event can be identified.
[0091] Furthermore, this application also provides a high-altitude object detection device 600 based on multi-source signal recognition. Figure 10 This is a schematic diagram of the program modules of the high-altitude object throwing detection device based on multi-source signal recognition in this application embodiment. In this embodiment, the high-altitude object throwing detection device 600 based on multi-source signal recognition includes:
[0092] Acquisition module 601: used to acquire video and audio of the high-altitude parabola area, create a Vibe background model based on the first frame and detect whether the audio is a parabola sound through a preset convolutional neural network model, wherein the video includes the first frame and the current frame;
[0093] Update module 602: used to obtain the foreground binary image of the current frame in real time through the vibe background model, and update the vibe background model according to the foreground binary image of the current frame;
[0094] Suspect module 603: used to extract the foreground binary image of the current frame and the foreground binary image obtained before the current frame and create a set of binary images. By subtracting all the foreground binary images obtained before the current frame from the foreground binary image of the current frame, the suspected parabolic trajectory of the motion point with interference removed is obtained.
[0095] Early warning module 604: used to determine whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm. If the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued.
[0096] This application provides a high-altitude object detection device 600 based on multi-source signal recognition, which can: acquire video and audio of a high-altitude object throwing area; create a Vibe background model based on a first frame and detect whether the audio is a parabolic sound using a preset convolutional neural network model, wherein the video frame includes the first frame and the current frame; acquire the foreground binary map of the current frame in real time using the Vibe background model, and update the Vibe background model based on the foreground binary map of the current frame; extract the foreground binary map of the current frame and preset foreground binary maps acquired before the current frame and create a set of binary maps; subtract all foreground binary maps acquired before the current frame from the foreground binary map of the current frame to obtain a suspected parabolic trajectory of the moving point after removing interference factors; determine whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm; if the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued. The method provided by this invention uses the Vibe background model as the foreground detection algorithm, which has a good suppression effect on foreground noise caused by camera shake, etc. Furthermore, by detecting the visual parabolic trajectory and recognizing the audio, it can better filter out interference factors that are very similar to the parabolic trajectory, thus ensuring that false detections are reduced when the parabola is detected.
[0097] Furthermore, this application also provides a high-altitude object throwing detection device based on multi-source signal recognition, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements each step of the above-described high-altitude object throwing detection method based on multi-source signal recognition.
[0098] Furthermore, this application also provides a storage medium storing a computer program thereon, which, when executed by a processor, implements the various steps of the high-altitude object throwing detection method based on multi-source signal recognition as described above.
[0099] In the various embodiments of this invention, the functional modules can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium.
[0100] Based on this understanding, the technical solutions of this invention, in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0101] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, as some steps can be performed in other orders or simultaneously according to the present invention. Secondly, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention. In the above embodiments, the descriptions of each embodiment have their own emphasis; for parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0102] For those skilled in the art, based on the ideas of the embodiments of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for detecting objects thrown from high altitudes based on multi-source signal recognition, characterized in that, The method includes: The video and audio of the high-altitude projectile area are acquired. A Vibe background model is created based on the first frame, and the audio is detected as a projectile sound using a preset convolutional neural network model. The video includes the first frame and the current frame. The foreground binary image of the current frame is obtained in real time through the Vibe background model, and the Vibe background model is updated according to the foreground binary image of the current frame. Extract the foreground binary image of the current frame and the foreground binary images obtained before the current frame, and create a set of binary images. Subtract all the foreground binary images obtained before the current frame from the foreground binary image of the current frame to obtain the suspected parabolic trajectory of the motion point after removing interference factors. In this process, subtracting all the foreground binary images obtained before the current frame from the foreground binary image of the current frame yields multiple noise-reduced difference images. If the noise-reduced difference map has a foreground point, then the foreground point is considered as a suspected moving point; The suspected parabolic trajectory is evaluated using the RANSAC algorithm to determine whether it conforms to the law of parabolic motion. If the suspected parabolic trajectory conforms to the law of parabolic motion, and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is triggered.
2. The detection method according to claim 1, characterized in that, The obtained suspected parabolic trajectory of the motion point after removing interference factors specifically includes: Extract the connected components of each noise-reduced difference graph, obtain the coordinate values of the centroid of each connected component, and save them in a coordinate list; The set of coordinate points stored in the coordinate list is regarded as the suspected parabolic trajectory, and the set of coordinate points includes multiple coordinate values.
3. The detection method according to claim 2, characterized in that, The determination of whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm specifically includes: The coordinate values are integrated into the same coordinate point set, and the RANSAC algorithm is used to fit a quadratic curve to the coordinate point set. If the fitted quadratic curve conforms to the law of parabola, then it is determined that a parabolic trajectory exists.
4. The detection method according to claim 3, characterized in that, The step of integrating the coordinate values into the same coordinate point set and using the RANSAC algorithm to fit a quadratic curve to the coordinate point set further includes: If the fitted quadratic curve does not conform to the parabolic law, the foreground binary map of the motion region of the current frame is obtained again through the Vibe background model, and the Vibe background model is updated again based on the foreground binary map of the current frame until the fitted quadratic curve conforms to the parabolic law.
5. The detection method according to claim 1, characterized in that, The process of detecting whether the audio is a parabolic sound using a preset convolutional neural network model specifically includes: The audio is analyzed in real time based on the preset convolutional neural network model; If the audio is determined to be the sound of a projectile landing, a projectile event signal is sent to the system.
6. The detection method according to claim 1, characterized in that, If the suspected parabolic trajectory conforms to the law of parabolic motion, and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued, specifically including: If the results of obtaining the suspected parabolic trajectory conforming to the law of parabola and obtaining the audio as a parabolic sound are obtained within a preset time period, a parabolic alarm is issued.
7. A high-altitude object detection device based on multi-source signal recognition, characterized in that, The device includes: Acquisition module: used to acquire video and audio of the high-altitude projectile area, create a Vibe background model based on the first frame, and detect whether the audio is a projectile sound through a preset convolutional neural network model, wherein the video includes the first frame and the current frame; Update module: used to obtain the foreground binary image of the current frame in real time through the Vibe background model, and update the Vibe background model according to the foreground binary image of the current frame; Suspected Module: Used to extract the foreground binary image of the current frame and the foreground binary images obtained before the current frame and create a set of binary images. By subtracting all the foreground binary images obtained before the current frame from the foreground binary image of the current frame, a suspected parabolic trajectory of the motion point with interference removed is obtained. In this case, by subtracting all the foreground binary images obtained before the current frame from the foreground binary image of the current frame, multiple noise-reduced difference images are obtained. If the noise-reduced difference map has a foreground point, then the foreground point is considered as a suspected moving point; Early warning module: used to determine whether the suspected parabolic trajectory conforms to the parabolic law using the RANSAC algorithm. If the suspected parabolic trajectory conforms to the parabolic law and the preset convolutional neural network model detects that the audio is a parabolic sound, then a parabolic alarm is issued.
8. A high-altitude object detection device based on multi-source signal recognition, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements each step of the high-altitude object detection method based on multi-source signal recognition as described in any one of claims 1-6.
9. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements each step of the high-altitude object detection method based on multi-source signal recognition as described in any one of claims 1-6.