Target detection method based on fusion sensor
By combining an event vision sensor with a CMOS image sensor, the challenge of target detection in low-light and fast-moving scenarios in ADAS is solved, achieving higher robustness and accuracy, and making it suitable for on-chip implementation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- OMNIVISION TECH (SHANGHAI) CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244762A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of detection technology, specifically relating to a target detection method based on fusion sensors. Background Technology
[0002] Advanced Driver Assistance Systems (ADAS) face significant challenges in target detection under low-light and high-speed motion conditions, which severely impacts system performance and safety. In these scenarios, the targets monitored by ADAS systems exhibit extremely high risk relevance.
[0003] Most mainstream object detection algorithms rely on CMOS image sensor data, which struggles to effectively detect targets in situations with severe motion blur. Event-based vision sensors offer microsecond-level temporal resolution and high dynamic range, providing complementary motion cues, but their performance is limited by the lack of large-scale labeled datasets, making it difficult to match CMOS image sensor-based methods. Furthermore, single sensors have inherent limitations in specific scenarios. Summary of the Invention
[0004] The purpose of this invention is to provide a target detection method based on a fusion sensor, combining the complementary advantages of event vision sensors and CMOS image sensors to improve the robustness and redundancy of target detection in low-light and fast-moving scenes, thereby increasing detection accuracy. The deblurring technique based on event vision sensors is more lightweight and better suited for on-chip implementation. This invention leverages a mature detection model from CMOS image sensors and requires neither large-scale data acquisition nor additional event dataset annotation.
[0005] This invention provides a target detection method based on fusion sensors, comprising:
[0006] S1, a CMOS image sensor acquires an image of the target scene, and an event vision sensor outputs an event stream within the image exposure time;
[0007] S2. Use the event stream to perform motion detection to determine whether there is large motion in the image; if it is determined that there is no large motion in the image, the image is directly sent to the object detection network for target detection; if it is determined that there is large motion in the image, the image is deblurred before being sent to the object detection network for target detection.
[0008] S3. The object detection network outputs the detection results.
[0009] Furthermore, motion detection in the component flow specifically includes:
[0010] S21. Perform polarity integration on the event stream within the exposure time to obtain a polarity-based event accumulation map;
[0011] S22. Filter out pixels that have undergone large movements, that is, pixels whose absolute value of the event accumulation map is greater than the threshold are judged as moving pixels and set to 1, and pixels whose absolute value of the event accumulation map is less than or equal to the threshold N are judged as non-moving pixels and set to 0, to obtain the event mask map.
[0012] S23. Calculate the proportion of moving pixels in the event mask image in the entire image, and determine whether there is large motion in the image based on the size of the proportion.
[0013] Furthermore, in step S23, if the ratio is greater than the upper threshold, it is considered that the image has undergone large motion; if the ratio is less than or equal to the lower threshold, it is considered that the image has not undergone large motion.
[0014] Furthermore, in step S23, if the ratio is greater than the lower threshold and less than or equal to the upper threshold, then the local sliding window method is used to determine whether large motion has occurred in the image.
[0015] Furthermore, the event mask map includes:
[0016] Case a: The object and the background move rapidly relative to the camera at the same time; a large number of event pixels are triggered. If the proportion is greater than the upper threshold, it is determined that there is large motion in the image, and deblurring is required.
[0017] Furthermore, the event mask map also includes:
[0018] Case b: The background is static, and the object is also static or moving slowly; the number of triggered event pixels is very small, and the total number of motion pixels in the entire image is very low, less than or equal to the lower threshold. In this case, it is determined that no large motion has occurred in the image, and no deblurring processing is required.
[0019] Furthermore, the event mask map also includes:
[0020] Case c: The background is static or moving slowly, while the object moves quickly relative to the camera; the triggered event pixels are more densely distributed on the foreground object, while there are almost no event pixels in the background area; the total number of motion pixels accounts for a low percentage of the entire image, less than or equal to the upper threshold and greater than the lower threshold. The density of motion pixels is determined by using a sliding window method, thereby inferring whether there is large motion in the image.
[0021] Furthermore, the event mask map also includes:
[0022] Case d: The background and the object move slowly relative to the camera at the same time; the event pixels that are triggered will be sparsely distributed on the image, and the total number of moving pixels will account for a low percentage of the entire image, which is less than or equal to the upper threshold and greater than the lower threshold. The sliding window method is used to determine the density of moving pixels and then infer whether there is large motion in the image.
[0023] Furthermore, the sliding window is used to infer whether there is large motion in the image, including method one:
[0024] A sliding window is set in the area where the image is located. The length of the sliding window is less than the length of the image, and the width of the sliding window is less than the width of the image.
[0025] The sliding window is scanned in the image from left to right and from top to bottom with a preset step size;
[0026] Each scan step counts the proportion of event pixels falling within the current sliding window. If the proportion of event pixels is greater than or equal to a preset value, it is determined that there is large motion in the image, and the scan is exited while continuing to the next step to deblur the image; otherwise, it is determined that there is no large motion, and the next scan operation continues.
[0027] Furthermore, the sliding window is used to infer whether there is large motion in the image, including method two:
[0028] A sliding window is set in the area where the image is located. The length of the sliding window is less than the length of the image, and the width of the sliding window is less than the width of the image.
[0029] The sliding window is scanned in the image from left to right and from top to bottom with a preset step size;
[0030] Each scan step counts the proportion of event pixels falling within the sliding window. If the proportion of event pixels is greater than or equal to a preset value, the sliding window is considered valid, and its position is recorded. Otherwise, it is determined that there is no large movement, and the next scan operation continues.
[0031] After the entire image is scanned, all effective sliding windows are clustered and merged to obtain the regions of interest where large movements occur;
[0032] If no valid sliding window exists, no further deblurring is required, and the data is directly fed into the object detection network for target detection. If a valid sliding window exists, it is determined that a large motion has occurred, and deblurring is performed only on the obtained region of interest.
[0033] Compared with the prior art, the present invention has the following beneficial effects:
[0034] This invention provides a target detection method based on a fusion sensor, comprising: S1, a CMOS image sensor acquiring an image of a target scene, and an event vision sensor outputting an event stream within the image exposure time; S2, using the event stream to perform motion detection to determine whether large motion occurs in the image; if it is determined that there is no large motion in the image, the image is directly sent to an object detection network for target detection; if it is determined that there is large motion in the image, the image is deblurred before being sent to the object detection network for target detection; S3, the object detection network outputs the detection result.
[0035] This invention combines the complementary advantages of event vision sensors and CMOS image sensors to improve the robustness of target detection in low-light and fast-moving scenes. The high temporal resolution of event vision sensors enables the restoration of severe motion blur, thus improving image quality before inputting CIS images into the object detection network, thereby increasing detection accuracy. Compared with similar deep learning-based methods, the deblurring technique based on event vision sensors is more lightweight and better suited for on-chip implementation. Compared with target detection methods based solely on event vision sensors, this invention leverages the mature detection model of CMOS image sensors and eliminates the need for large-scale data acquisition and additional event dataset annotation. Attached Figure Description
[0036] Figure 1 This is a schematic diagram of the target detection method based on fusion sensors according to an embodiment of the present invention.
[0037] Figure 2 This is the first part of a schematic diagram of event-based motion detection in the target detection method based on fused sensors according to an embodiment of the present invention.
[0038] Figure 3 This is the second part of a schematic diagram of event-based motion detection in the target detection method based on fused sensors according to an embodiment of the present invention.
[0039] Figure 4 This is a schematic diagram of the event mask map generated in the target detection method based on fused sensors according to an embodiment of the present invention.
[0040] Figure 5 This is a schematic diagram of case b, which is the event mask map generated in the target detection method based on fused sensors according to an embodiment of the present invention.
[0041] Figure 6 This is a schematic diagram of the event mask map generated in the target detection method based on fused sensors according to an embodiment of the present invention.
[0042] Figure 7 This is a schematic diagram of the event mask map generated in the target detection method based on fused sensors according to an embodiment of the present invention.
[0043] Figure 8 This is a schematic diagram of image deblurring in the target detection method based on fusion sensors according to an embodiment of the present invention. Detailed Implementation
[0044] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. The advantages and features of the present invention will become clearer from the following description. It should be noted that the drawings are in a very simplified form and use non-precise proportions, and are only used to facilitate and clarify the illustration of the embodiments of the present invention.
[0045] For ease of description, some embodiments of this application may use spatially relative terms such as “above,” “below,” “top,” and “under” to describe the relationship between one element or component and another (or more) elements or components as shown in the accompanying drawings. It should be understood that, in addition to the orientations described in the drawings, spatially relative terms are also intended to include different orientations of the device during use or operation. For example, if the device in the drawings is flipped, it is described as an element or component “below” or “under” other elements or components, and will subsequently be positioned “above” or “on” other elements or components. The terms “first,” “second,” etc., used below are used to distinguish between similar elements and are not necessarily used to describe a particular order or temporal sequence. It should be understood that these terms, as used, may be replaced where appropriate.
[0046] This invention provides a target detection method based on fusion sensors, such as... Figure 1 As shown, it includes:
[0047] Step S1: The CMOS image sensor (CIS) acquires an image of the target scene, and the event vision sensor (EVS) outputs an event stream within the image exposure time.
[0048] Step S2: Use event streams to perform motion detection to determine if there is significant motion in the image; if there is no significant motion in the image, the image is directly sent to the object detection network for target detection; if there is significant motion in the image, the image is deblurred before being sent to the object detection network for target detection.
[0049] Step S3: The object detection network outputs the detection results.
[0050] The steps of the target detection method based on fusion sensors in this embodiment are described in detail below with reference to the accompanying drawings.
[0051] Step S1: The CMOS image sensor (CIS) acquires an image of the target scene, and the event vision sensor (EVS) outputs an event stream within the image exposure time. The CMOS image sensor chip and the event vision sensor (EVS) chip can be integrated on a chip within the camera, allowing the camera to capture and acquire images of the target scene. This invention does not limit the targets to be detected; some typical targets are exemplified. In low-light and fast-moving scenarios, targets of interest to the ADAS system have extremely high risk relevance. Examples of targets include: (i) core dynamic targets that directly threaten safety, such as motor vehicles, pedestrians, and non-motorized vehicle riders; and (ii) static / low-speed targets that constitute path obstacles, such as road construction facilities and disabled / stationary vehicles.
[0052] Step S2: Use event flow to perform motion detection to determine if there is significant motion in the image. If no significant motion is detected, the image is directly sent to the object detection network for target detection. Since the absence of significant motion usually indicates low image blur and relatively high image quality, it has little impact on the detection results and does not require additional preprocessing; the image can be directly sent to the object detection network for target detection. If significant motion is detected, the image is deblurred before being sent to the object detection network for target detection.
[0053] Specifically, the detailed steps of event-based motion detection are as follows:
[0054] Step S21, as follows Figure 2 As shown, polarity integration is performed on the event flow e(s) within the CIS exposure time T to obtain the polarity-based event accumulation map E(T). This step can be expressed by the following formula:
[0055]
[0056] e(s) contains information of (x,y,p,t), where (x,y) represents the position of the pixel in the image; p represents the polarity of the event; and t represents the timestamp of the event.
[0057] Step S22: Filter out the pixels that have large motions, that is, set the pixels whose absolute value of the event accumulation map is greater than the threshold N (motion) to 1, and set the pixels whose absolute value is less than or equal to the threshold N (non-motion) to 0, and obtain the event mask map B(x,y).
[0058] Step S23, Combining Figure 3As shown, the event mask map shows the proportion of moving pixels in the entire image. The magnitude of this proportion determines whether significant motion has occurred in the image. If the proportion is greater than an upper threshold L1, significant motion is considered to have occurred in the image; if the proportion is less than or equal to a lower threshold, significant motion is considered not to have occurred. The proportion Rg can be expressed by the following formula:
[0059]
[0060] If the ratio is greater than the lower threshold and less than or equal to the upper threshold L1, then the local sliding window method is used to determine whether large motion has occurred in the image.
[0061] In this invention, the generated event mask image will have the following four cases:
[0062] Case a: such as Figure 4 As shown, the object and background move rapidly relative to the camera simultaneously. In this case, a large number of event pixels are usually triggered. If the proportion of selected moving pixels exceeds the upper threshold L1 (e.g., L1 = 50%), it is determined that there is significant motion in the image, requiring image deblurring. The selection formula is as follows:
[0063]
[0064] Additional Degradation Inversion (EDI) can be used to deblur the image. EDI uses the idea of "how to make a blurred image look more like it should be blurred" to deblur the image in reverse. The state I(0) of the image at the initial exposure moment can be calculated from the CIS image B and the event stream e(s) during this exposure time:
[0065]
[0066] Where c represents the threshold required to trigger the event; T represents the exposure time. Figure 8 The demonstration shows the effect of deblurring a CIS image using EDI. It can be seen that the car, which was originally severely blurred, is restored to a certain extent and can be distinguished.
[0067] The event mask diagram also includes case b: such as Figure 5 As shown, the background is static, and the object is also static or moving slowly, resulting in very few event pixels being triggered. The total number of moving pixels occupies a very low percentage across the entire image, less than or equal to the lower threshold αL2 (L2 is...). Figure 5 The proportion of small to medium-sized sliding windows in the image (the sliding window is usually the minimum size of the detected object in the image). This situation is considered to indicate that there is no large motion in the image, and no deblurring operation is required.
[0068] The preset value α is an adjustable value before the program runs, but it remains fixed during program execution, with an adjustment range between [0.5, 1]. A larger value for α indicates a stricter condition for a valid (moving) window. For example, if α = 1, all pixels within the window must be moving pixels to be considered valid, thus indicating significant motion in the image. Conversely, a smaller value for α indicates a more lenient condition for motion. For example, if α = 0.7, more than 70% of the pixels in the window need to be moving pixels to be considered valid, indicating significant motion in the image. Both the sliding window size and the preset value α are fixed values set before program execution and will not be adjusted or changed during program execution. If the event ratio in the window is less than the preset value (70%), it is determined that there is no significant motion in the window, and the window is then slid to the next position for the same judgment operation. If the event ratio in the window is greater than the preset value (70%), it is determined that there is significant motion in the window, and therefore significant motion is present in the image. Note that the presence of event pixels does not necessarily indicate significant motion in the image. For small or slight movements, the distribution of event pixels on the image is relatively discrete, or the contour lines are thin. Conversely, significant camera motion inevitably produces a spatially dense distribution of event pixels. Therefore, a sliding window method is used to determine the density of event pixels and thus infer the presence of significant motion. Furthermore, this method can not only determine the presence of significant motion in an image but also serve as a criterion for judging the degree of image blur, as there is a positive correlation between significant motion and image blur. Image blur is often an important input for some downstream applications; for example, in this invention, it determines whether further deblurring operations are needed, thus contributing to saving computational resources.
[0069] The event mask diagram also includes case c: such as Figure 6 As shown, when the background is static or moving slowly, and the object moves rapidly relative to the camera, the event pixels triggered in this scenario are densely distributed on the foreground object, while the background area has almost no event pixels. The total proportion of motion pixels in the entire image is low, less than or equal to the upper threshold L1 (e.g., L1 = 50%), but greater than the lower threshold αL2 (L2 is the proportion of the small sliding window in the image above). In this case, it is not possible to simply determine whether large motion is occurring in the image based on this proportion. This invention proposes using a sliding window method to determine the density of motion pixels, thereby inferring whether large motion exists in the image.
[0070] The event mask diagram also includes case d: such as Figure 7As shown, when the background and object move slowly relative to the camera, the event pixels are sparsely distributed across the image. The total proportion of moving pixels is low, less than or equal to the upper threshold L1 (e.g., L1 = 50%), but greater than the lower threshold αL2 (L2 is the proportion of the small sliding window in the image above). Similar to case c, we cannot simply judge whether there is large motion in the image from this proportion. We need to use the sliding window method to determine the density of moving pixels and then infer whether there is large motion in the image.
[0071] Specifically, using a sliding window to infer whether there is large motion in an image includes method one:
[0072] A sliding window is set in the area where the image is located. The length w of the sliding window is less than the length W of the image, and the width h of the sliding window is less than the width H of the image.
[0073] The sliding window is scanned in the image from left to right and from top to bottom with a preset step size s;
[0074] Each scan step counts the proportion of event pixels falling within the current sliding window. If the proportion of event pixels is greater than or equal to a preset value (e.g., α=90%), it is determined that there is large motion in the image, and the scan is exited. At the same time, the next step is performed to deblur the entire image. Otherwise, it is determined that there is no large motion, and the next scan operation is performed.
[0075] Using a sliding window to infer whether there is large motion in an image also includes a second method: setting a sliding window in the region of the image, where the length w of the sliding window is less than the length W of the image, and the width h of the sliding window is less than the width H of the image;
[0076] The sliding window is scanned in the image from left to right and from top to bottom with a preset step size;
[0077] Each scan step counts the proportion of event pixels falling within the sliding window. If the proportion of event pixels is greater than or equal to a preset value (e.g., α=70%), the sliding window is considered valid, and its position is recorded. Otherwise, it is determined that there is no large movement, and the next scan operation continues.
[0078] After the entire image has been scanned, all valid sliding windows are clustered and merged to obtain the regions of interest where large movements occur.
[0079] If no valid sliding window exists, no further deblurring is required, and the data is directly fed into the object detection network for target detection. If a valid sliding window exists, it is determined that a large motion has occurred, and deblurring is only performed on the obtained region of interest.
[0080] In the two methods described above, the event pixel ratio is 𝑅 l_𝑖 The calculation can be expressed by the following formula:
[0081]
[0082] Where m and n are the coordinates of the top left of the sliding window, K is the number of sliding windows, and i refers to the i-th sliding window.
[0083] The object detection network of this invention can employ all object detection algorithms based on CMOS image sensors (such as YOLO, SSD, and Fast R-CNN), and different object detection algorithms will all result in a certain degree of performance improvement. One-stage detectors, such as YOLO and SSD, directly regress the bounding box and category on the image. The core idea of YOLO is to reconstruct object detection as a single regression problem, predicting the bounding box and category probability directly from image pixels in one step. Two-stage detectors, such as Fast R-CNN, first generate candidate regions, and then classify and regress these regions. This invention enhances the added value of fused sensors in object detection applications, such as ADAS, robotics, and drones. It improves object detection accuracy in low-light and fast-moving scenarios. In fast-moving scenarios, image deblurring can significantly improve object detection accuracy.
[0084] In summary, this invention provides a target detection method based on a fusion sensor, comprising: S1, a CMOS image sensor acquiring an image of a target scene, and an event vision sensor outputting an event stream within the image exposure time; S2, using the event stream to perform motion detection to determine whether large motion occurs in the image; if it is determined that there is no large motion in the image, the image is directly sent to an object detection network for target detection; if it is determined that there is large motion in the image, the image is deblurred before being sent to the object detection network for target detection; S3, the object detection network outputs the detection result.
[0085] This invention combines the complementary advantages of event vision sensors and CMOS image sensors to improve the robustness of target detection in low-light and fast-moving scenes. The high temporal resolution of event vision sensors enables the restoration of severe motion blur, thus improving image quality before inputting CIS images into the object detection network, thereby increasing detection accuracy. Compared with similar deep learning-based methods, the deblurring technique based on event vision sensors is more lightweight and better suited for on-chip implementation. Compared with target detection methods based solely on event vision sensors, this invention leverages the mature detection model of CMOS image sensors and eliminates the need for large-scale data acquisition and additional event dataset annotation.
[0086] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. The methods disclosed in the embodiments are described simply because they correspond to the devices disclosed in the embodiments; relevant details can be found in the method section.
[0087] The above description is merely a description of preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any person skilled in the art can make possible changes and modifications to the technical solutions of the present invention by utilizing the methods and techniques disclosed above without departing from the spirit and scope of the present invention. Therefore, any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the content of the technical solutions of the present invention shall fall within the protection scope of the technical solutions of the present invention.
Claims
1. A target detection method based on fusion sensors, characterized in that, include: S1, a CMOS image sensor acquires an image of the target scene, and an event vision sensor outputs an event stream within the image exposure time; S2. Use the event stream to perform motion detection to determine whether there is large motion in the image; if it is determined that there is no large motion in the image, the image is directly sent to the object detection network for target detection; if it is determined that there is large motion in the image, the image is deblurred before being sent to the object detection network for target detection. S3. The object detection network outputs the detection results.
2. The target detection method based on fusion sensors as described in claim 1, characterized in that, Motion detection using the event stream specifically includes: S21. Perform polarity integration on the event stream within the exposure time to obtain a polarity-based event accumulation map; S22. Filter out pixels that have undergone large movements, that is, determine the pixels whose absolute value of the event accumulation map is greater than the threshold as moving pixels and set them to 1, and determine the pixels whose absolute value of the event accumulation map is less than or equal to the threshold as non-moving pixels and set them to 0, to obtain the event mask map. S23. Calculate the proportion of moving pixels in the event mask image in the entire image, and determine whether there is large motion in the image based on the size of the proportion.
3. The target detection method based on fusion sensors as described in claim 2, characterized in that, In step S23, if the ratio is greater than the upper threshold, it is considered that the image has large motion; if the ratio is less than or equal to the lower threshold, it is considered that the image has no large motion.
4. The target detection method based on fusion sensors as described in claim 2, characterized in that, In step S23, if the ratio is greater than the lower threshold and less than or equal to the upper threshold, then the local sliding window method is used to determine whether large motion has occurred in the image.
5. The target detection method based on fusion sensors as described in claim 2, characterized in that, The event mask image includes: Case a: The object and the background move rapidly relative to the camera at the same time; a large number of event pixels are triggered. If the proportion is greater than the upper threshold, it is determined that there is a large motion in the image, and deblurring is required.
6. The target detection method based on fusion sensors as described in claim 2, characterized in that, The event mask diagram also includes: Case b: The background is static, and the object is also static or moving slowly; the number of triggered event pixels is very small, and the total number of motion pixels in the entire image is very low, less than or equal to the lower threshold. In this case, it is determined that no large motion has occurred in the image, and no deblurring processing is required.
7. The target detection method based on fusion sensors as described in claim 2, characterized in that, The event mask diagram also includes: Case c: The background is stationary or moving slowly, while the object moves quickly relative to the camera; the triggered event pixels are more densely distributed on the foreground object, while there are almost no event pixels in the background area; the total proportion of motion pixels in the entire image is low, less than or equal to the upper threshold and greater than the lower threshold. The density of motion pixels is determined by using a sliding window method, and then it is inferred whether there is large motion in the image.
8. The target detection method based on fusion sensors as described in claim 2, characterized in that, The event mask diagram also includes: Case d: The background and the object move slowly relative to the camera at the same time; the event pixels that are triggered will be sparsely distributed on the image, and the total number of moving pixels will account for a low percentage of the entire image, which is less than or equal to the upper threshold and greater than the lower threshold. The sliding window method is used to determine the density of moving pixels and then infer whether there is large motion in the image.
9. The target detection method based on fusion sensors as described in claim 7 or 8, characterized in that, Using the sliding window to infer whether there is large motion in the image includes method one: A sliding window is set in the area where the image is located. The length of the sliding window is less than the length of the image, and the width of the sliding window is less than the width of the image. The sliding window is scanned in the image from left to right and from top to bottom with a preset step size; Each scan step counts the proportion of event pixels falling within the current sliding window. If the proportion of event pixels is greater than or equal to a preset value, it is determined that there is large motion in the image, and the scan is exited. At the same time, the next step is performed to deblur the image. Conversely, if no significant movement is detected, the next scanning step is performed.
10. The target detection method based on fusion sensors as described in claim 7 or 8, characterized in that, Using the sliding window to infer whether there is large motion in the image, including method two: A sliding window is set in the area where the image is located. The length of the sliding window is less than the length of the image, and the width of the sliding window is less than the width of the image. The sliding window is scanned in the image from left to right and from top to bottom with a preset step size; Each scan step counts the proportion of event pixels falling within the sliding window. If the proportion of event pixels is greater than or equal to a preset value, the sliding window is considered valid, and its position is recorded. Otherwise, it is determined that there is no large movement, and the next scan operation continues. After the entire image is scanned, all effective sliding windows are clustered and merged to obtain the regions of interest where large movements occur; If no valid sliding window exists, no further deblurring is required, and the data is directly fed into the object detection network for target detection. If a valid sliding window exists, it is determined that a large motion has occurred, and deblurring is performed only on the obtained region of interest.