AI object detection system, method, electronic device and storage medium based on lightweight motion detection

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By optimizing the frame difference method with a lightweight motion detection system and combining image processing and connected component analysis, the detection accuracy and real-time performance issues of edge AI models in low-resolution images are solved, enabling accurate localization and precise recognition of moving objects.

CN115797870BActive Publication Date: 2026-06-12BOLIU INTELLIGENT TECH (SHANGHAI) CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BOLIU INTELLIGENT TECH (SHANGHAI) CO LTD
Filing Date: 2022-12-20
Publication Date: 2026-06-12

Smart Images

Figure CN115797870B_ABST

Patent Text Reader

Abstract

The application discloses an AI object detection system and method based on light action detection, an electronic device and a storage medium. The AI object detection system comprises a picture acquisition module, a picture difference identification module, a region marking module and a moving object detection module. The picture acquisition module is used to acquire picture data of a set region. The picture difference identification module is used to identify picture differences of the picture acquisition module at different time points. The region marking module is used to generate a boundary mark when the picture difference identification module identifies a picture difference. The boundary mark contains a picture difference part meeting a set condition. The moving object detection module is used to identify a picture in the boundary mark according to the boundary mark generated by the region marking module, and acquire the position of a target object. The AI object detection system and method based on light action detection, the electronic device and the storage medium can improve the speed and accuracy of action detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology and relates to a motion detection system, particularly to an AI object detection system, method, electronic device, and storage medium based on lightweight motion detection. Background Technology

[0002] Imaging technology has been widely applied in home security systems, such as intelligent monitoring and recognition. In the past, human detection systems often used sensors like infrared detectors for identification, but these instruments were costly. With the decreasing cost of building image recognition systems, these systems can solve human detection problems more efficiently. Foreground motion detection is a prerequisite for video analysis technologies such as behavior analysis and action detection, and has significant application prospects. Foreground moving object detection algorithms mainly include frame difference, optical flow, and background subtraction. Frame difference can balance real-time performance and detection accuracy, making it the foundation of many complex detection algorithms. In practical applications, it is often combined with other algorithms to improve the overall detection effect. This invention uses a modified frame difference method for foreground motion detection and combines it with a human detector for human detection. Traditional human detection algorithms use low-resolution images after image scaling, thus failing to achieve precise human localization and unable to locate distant humans.

[0003] When performing target object detection on edge products, AI end-to-end models are usually used directly. However, AI models use low-resolution images, and real-world scenes are often complex due to factors such as lighting and angle, making it impossible for AI models to meet the requirements of high accuracy.

[0004] In previous frame difference techniques, morphological analysis, including Gaussian filtering and image dilation, requires calculations on the entire image. Furthermore, the Gaussian filtering template uses non-power-of-two formulas, resulting in significant computational complexity. Connectivity analysis employs Green's theorem and then uses second- and third-order image matrices to find contours in the binary image, leading to high algorithm complexity that fails to meet real-time requirements.

[0005] In view of this, there is an urgent need to design a new motion detection method in order to overcome at least some of the aforementioned shortcomings of existing motion detection methods. Summary of the Invention

[0006] This invention provides an AI object detection system, method, electronic device, and storage medium based on lightweight motion detection, which can improve the speed and accuracy of motion detection.

[0007] To solve the above-mentioned technical problems, according to one aspect of the present invention, the following technical solution is adopted:

[0008] An AI object detection system based on lightweight motion detection, the AI object detection system comprising:

[0009] The image acquisition module is used to acquire image data of a designated area;

[0010] The image difference recognition module is used to identify the image differences of the image acquisition module at different set time points;

[0011] A region marking module is used to generate boundary markers when the image difference recognition module detects an image difference. The boundary markers contain the image difference portions that meet set conditions.

[0012] The moving object detection module is used to identify the image within the boundary marker based on the boundary marker generated by the distinguishing marker module, and obtain the position of the target object.

[0013] In one embodiment of the present invention, the image difference recognition module includes:

[0014] The image processing unit is used to binarize the RGB color image and convert it into a grayscale image;

[0015] The difference operation unit is used to calculate the difference between corresponding pixels in two adjacent frames and determine the absolute value of the grayscale value; pixels with an absolute value greater than a threshold T are considered foreground, otherwise they are considered background, as shown in the following formula:

[0016]

[0017] Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result.

[0018] In one embodiment of the present invention, the region marking module includes:

[0019] The pixel position acquisition unit for moving targets is used to binarize the image F. n-1 and F n Perform calculations to obtain the pixel position of the moving target;

[0020] The binarized image morphology processing unit is used to perform morphological processing on the binarized image, using salient point analysis, Gaussian filtering, and image dilation. It traverses all pixels in the image for morphological processing. First, salient point analysis is used to determine whether the four neighbors of the current pixel have a value. Then, filtering is performed only if there is a value. Finally, image dilation is performed on the current pixel to complete the morphological processing of this point.

[0021] The connected component analysis unit performs connected component analysis on the image to find contour boxes containing complete moving targets. For a binary image, all pixels are traversed sequentially. If a point is foreground, its neighboring pixels within a specific range are assigned to the same category, and a bounding box is assigned to this category, with the bounding box coordinates recorded. Therefore, pixels within the same bounding box belong to the same connected component. When traversing the binary image later, if a foreground point already has a certain category, no new category will be generated for that point. After traversing all pixels in the binary image, non-maximum suppression is performed to delete bounding boxes larger than a threshold. When the connected components of these bounding boxes are larger than a given threshold, a moving object is considered to have been detected. Based on the coordinates of the matching bounding boxes, a rectangle that can contain all connected components is obtained, and this rectangle is used as the input for the subsequent AI human detection device.

[0022] According to another aspect of the present invention, the following technical solution is adopted: an AI object detection method based on lightweight motion detection, the AI object detection method comprising:

[0023] Image acquisition steps; Acquire image data for a specified area;

[0024] Image difference recognition step; recognizing image differences at different time points during the image acquisition step;

[0025] The region marking step; when a scene difference is detected in the scene difference recognition step, a boundary marker is generated, the boundary marker containing the scene difference portion that meets the set conditions; and

[0026] Moving object detection step: Based on the boundary marker generated in the distinguishing marking step, identify the image within the boundary marker to obtain the position of the target object.

[0027] As one embodiment of the present invention, the image difference recognition step includes:

[0028] Image processing steps: Binarize the RGB color image to convert it into a grayscale image;

[0029] The differential calculation steps are as follows: Calculate the difference between the corresponding pixels in two adjacent frames, and determine the absolute value of the grayscale value; if the absolute value is greater than the threshold T, the pixel is considered foreground; otherwise, it is considered background. The corresponding formula is as follows:

[0030]

[0031] Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result.

[0032] As one embodiment of the present invention, the region marking step includes:

[0033] Steps for obtaining the pixel position of a moving target; binarizing the image F n-1 and F n Perform calculations to obtain the pixel position of the moving target;

[0034] The steps for morphological processing of a binarized image are as follows: Morphological processing of the binarized image is performed using salient point analysis, Gaussian filtering, and image dilation; Morphological processing is performed on all pixels in the image by first using salient point analysis to determine whether the four neighbors of the current pixel have a value, and then filtering is performed only if there is a value; Finally, image dilation is performed on the current pixel to complete the morphological processing of this point.

[0035] The connected component analysis process involves: performing connected component analysis on the image to find bounding boxes containing complete moving targets; for a binary image, sequentially traversing all pixels, if a point is foreground, assigning its neighboring pixels within a specific range to the same category, recording the bounding box coordinates, thus pixels within the same bounding box belong to the same connected component; when traversing the binary image, if a foreground point already has a category, no new category will be generated for that point; after traversing all pixels in the binary image, non-maximum suppression is performed to delete bounding boxes larger than a threshold; when the connected components of these bounding boxes are larger than a given threshold, a moving object is considered detected; finally, based on the coordinates of the matching bounding boxes, a rectangle containing all connected components can be obtained, and this rectangle is used as the input for the subsequent AI human detection system.

[0036] According to another aspect of the present invention, the following technical solution is adopted: an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above method.

[0037] According to another aspect of the present invention, the following technical solution is adopted: a storage medium storing computer program instructions thereon, which, when executed by a processor, implement the steps of the above-described method.

[0038] The beneficial effects of the present invention are as follows: The AI object detection system, method, electronic device and storage medium based on lightweight motion detection proposed in the present invention can improve the speed and accuracy of motion detection.

[0039] The Moving Object Detector of this invention offers the following advantages:

[0040] This invention can accurately locate moving objects in an image, and remove the background area from the image, leaving only the foreground salient area as the input for human detection. Because the human detector only focuses on moving objects in the image, it can more accurately identify the position of human figures in the image.

[0041] This invention focuses only on salient regions, thus increasing the detection range; this invention can reduce interference from background obfuscation; and the optimized morphological processing and connectivity analysis used in the frame difference method can improve the inference speed. Attached Figure Description

[0042] Figure 1 This is a schematic diagram of the composition of an AI object detection system in one embodiment of the present invention.

[0043] Figure 2 This is a flowchart of an AI object detection method in one embodiment of the present invention.

[0044] Figure 3 This is another flowchart of an AI object detection method in one embodiment of the present invention.

[0045] Figure 4 This is a schematic diagram of connected component analysis in one embodiment of the present invention.

[0046] Figure 5 This is a schematic diagram illustrating the coefficients of a Gaussian filter in one embodiment of the present invention.

[0047] Figure 6 This is a schematic diagram of a moving object detector assisted by a human figure detector in one embodiment of the present invention.

[0048] Figure 7 This is a schematic diagram of the composition of an electronic device according to an embodiment of the present invention. Detailed Implementation

[0049] The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0050] To further understand the present invention, preferred embodiments of the present invention are described below in conjunction with examples. However, it should be understood that these descriptions are only for further illustrating the features and advantages of the present invention, and not for limiting the scope of the claims of the present invention.

[0051] The description in this section pertains to only a few typical embodiments, and the present invention is not limited to the scope of the embodiments described. Substitution of identical or similar prior art methods with some technical features in the embodiments is also within the scope of the description and protection of this invention.

[0052] The steps described in the various embodiments of the specification are for illustrative purposes only, and the implementation of this application is not limited by the order of the steps. The term "connection" in the specification includes both direct and indirect connections.

[0053] This invention discloses an AI object detection system based on lightweight motion detection. Figure 1 This is a schematic diagram of the composition of an AI object detection system in one embodiment of the present invention; please refer to [link / reference]. Figure 1 The AI object detection system includes: an image acquisition module 1, an image difference recognition module 2, an area marking module 3, and a moving object detection module 4.

[0054] The image acquisition module 1 is used to acquire image data of a set area.

[0055] The image difference recognition module 2 is used to identify the image differences of the image acquisition module at different set time points.

[0056] The region marking module 3 is used to generate a boundary marker when the image difference recognition module recognizes an image difference. The boundary marker contains the image difference portion that meets the set conditions.

[0057] The moving object detection module 4 is used to identify the image within the boundary mark based on the boundary mark generated by the distinguishing mark module, and obtain the position of the target object.

[0058] In one embodiment of the present invention, the image difference recognition module 2 includes: an image processing unit 21 and a difference operation unit 22.

[0059] The image processing unit 21 is used to binarize the RGB color image into a grayscale image. In one embodiment, the correspondence is as follows: Gray = 0.299*Red + 0.587*Green + 0.114*Blue. Wherein, Gray represents the grayscale image; Red, Green, and Blue correspond to the red, green, and blue of the three primary color light modes, respectively.

[0060] The difference operation unit 22 is used to calculate the difference between corresponding pixels in two adjacent frames and determine the absolute value of the grayscale value; pixels with an absolute value greater than a threshold T are foreground, and those with a value less than a threshold T are background, as shown in the following formula:

[0061]

[0062] Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result.

[0063] In one embodiment of the present invention, the region marking module 3 includes: a pixel position acquisition unit 31 for moving targets, a binarized image morphology processing unit 32, and a connected component analysis unit 33.

[0064] The pixel position acquisition unit 31 of the moving target is used to binarize the image F n-1 and F n The calculations are performed to obtain the pixel position of the moving target.

[0065] The binarized image morphological processing unit 32 is used to perform morphological processing on the binarized image, using salient point analysis, Gaussian filtering, and image dilation. It traverses all pixels in the image for morphological processing, first using salient point analysis to determine if the four neighbors of the current pixel have values, and then filtering is performed only if values are found. In one embodiment, the coefficients of the Gaussian filter template are powers of 2, and Gaussian filtering calculation can be completed using bitwise operations. Finally, image dilation is performed on the current pixel to complete the morphological processing for that point.

[0066] The connected component analysis unit 33 is used to perform connected component analysis on the image to find the contour box containing the complete moving target. For a binary image, all pixels are traversed sequentially. If a point is foreground, its neighboring pixels within a specific range are set to the same category, a bounding box is assigned to this category, and the bounding box coordinates are recorded. Therefore, pixels within the same bounding box belong to the same connected component. When traversing the binary image later, if a foreground point already has a certain category, no new category will be generated for this point. Figure 4 As shown, using a boundary range set to 5, black dots represent foreground points, and gray dots represent background points, resulting in three rectangles: red, yellow, and green. However, some connected components overlap. Therefore, after traversing all pixels in the binary image, non-maximum suppression is performed to remove bounding boxes with an Intersection over Union (IoU) greater than a threshold. Within these bounding boxes, if their connected components are greater than a given threshold, a moving object is considered detected. Finally, based on the coordinates of the matching bounding boxes, a rectangle containing all connected components is obtained, and this rectangle is used as the input for the subsequent AI human detection system.

[0067] This invention uses a moving object detector for the first stage of target object detection. The moving object detector judges whether the image has changed based on the difference between the current frame and the previous frame according to a threshold. When a moving object is detected, a bounding box is used to include all the changed pixels in the image, and this bounding box is sent to the AI model for human detection. The moving object detector can accurately obtain the approximate position of the target object. This detection method is only sensitive to moving objects in the image, is not sensitive to changes in lighting, and is not affected by slow changes in the background. Therefore, it is suitable for use in dynamically changing environments and can reduce interference from background areas. Moreover, it has low computational requirements and fast detection speed, making it suitable for applications with high real-time requirements.

[0068] This invention, based on the characteristics of morphological operations, combines the operations that originally required three pixel traversals (Gaussian filtering, binarization, and dilation) into a single full pixel traversal. Figure 3 The 1 / 273 in the value is incorporated into the binarization threshold, reducing the number of division operations from the total number of pixels to a single division operation. Therefore, after the weighting operation in the Gaussian filter, the pixel can be directly determined as 255 or 0 (binarization) through the threshold. If the pixel has a value, it is then dilated by a factor of n using a dilation operation.

[0069] This invention proposes a salient point analysis based on Gaussian filter coefficients, such as... Figure 5 As shown, the Gaussian filter has the highest weight for neighboring pixels (the red cross in the middle). Therefore, if the binarized values of all neighboring pixels are 0, the weighted value can be expected to be almost 0. Thus, the Gaussian filter operation for that pixel can be skipped, which can significantly reduce the computation time.

[0070] This invention proposes a single-pass full-pixel traversal connectivity analysis. It sequentially traverses all pixels. If a pixel is foreground, its neighboring pixels within a specific range are classified as the same category, and a bounding box is assigned to this category, with its coordinates recorded. Therefore, pixels within the same bounding box belong to the same connected component. After traversing all pixels in the binary image, non-maximum suppression is performed to delete bounding boxes with an Intersection over Union (IoU) greater than a threshold. Within these bounding boxes, if their connected component is greater than a given threshold, a moving object is considered detected.

[0071] Figure 6 This diagram illustrates a human detection assisting a moving object detector. Systems with only human detection algorithms use low-resolution images after scaling, thus failing to achieve precise human localization and potentially causing misidentification; furthermore, they cannot locate smaller human figures at a distance.

[0072] Using a moving object detector for the first stage of target detection can accurately locate moving objects in the image. By removing the background area from the image and leaving only the salient foreground region as input for the next stage of human detection, false positives caused by background clutter are reduced, and the position of human figures in the image is more accurately identified. In addition, the preserved salient foreground region can be magnified by image scaling, making previously small human figures in the distance appear larger, thus increasing the detection distance.

[0073] This invention also discloses an AI object detection method based on lightweight motion detection. Figure 2 This is a flowchart of an AI object detection method in one embodiment of the present invention; please refer to [link / reference]. Figure 2 The AI object detection method includes:

[0074] 【Step S1】Image acquisition step; acquire image data of the designated area;

[0075] 【Step S2】Image difference recognition step; Identify the image differences at different time points in the image acquisition step;

[0076] In one embodiment of the present invention, the image difference recognition step includes:

[0077] Image processing steps: The RGB color image is binarized and converted into a grayscale image. In one embodiment, the correspondence is as follows: Gray = 0.299*Red + 0.587*Green + 0.114*Blue.

[0078] The differential calculation steps are as follows: Calculate the difference between the corresponding pixels in two adjacent frames, and determine the absolute value of the grayscale value; if the absolute value is greater than the threshold T, the pixel is considered foreground; otherwise, it is considered background. The corresponding formula is as follows:

[0079]

[0080] Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result.

[0081] 【Step S3】Region marking step; When a screen difference is identified in the screen difference recognition step, a boundary marker is generated, and the boundary marker contains the screen difference portion that meets the set conditions;

[0082] In one embodiment of the present invention, the region marking step includes:

[0083] Steps for obtaining the pixel position of a moving target; binarizing the image F n-1 and F n The calculations are performed to obtain the pixel position of the moving target.

[0084] The steps for morphological processing of a binarized image are as follows: Morphological processing is performed on the binarized image using salient point analysis, Gaussian filtering, and image dilation. Morphological processing is performed on all pixels in the image. First, salient point analysis is used to determine if the four neighbors of the current pixel have values. Filtering is only performed if values are found. In one embodiment, the coefficients of the Gaussian filter template are powers of 2. Gaussian filtering calculation can be completed using bitwise operations. Finally, image dilation is performed on the current pixel to complete the morphological processing for that point.

[0085] Connected component analysis steps: Perform connected component analysis on the image to find bounding boxes containing complete moving targets; For a binary image, sequentially traverse all pixels. If a point is foreground, assign its neighboring pixels within a specific range to the same category, assign a bounding box to this category, and record the bounding box coordinates. Therefore, pixels within the same bounding box belong to the same connected component; When traversing the binary image later, if a foreground point already has a certain category, no new category will be generated for this point. Figure 4 As shown, using a boundary range set to 5, black dots as foreground points, and gray dots as background points, three rectangles (red, yellow, and green) are generated. However, some connected components overlap. Therefore, after traversing all pixels in the binary image, non-maximum suppression is performed to remove bounding boxes with an Intersection over Union (IoU) greater than a threshold. Within these bounding boxes, if their connected components are greater than a given threshold, a moving object is considered detected. Finally, based on the coordinates of the matching bounding boxes, a rectangle containing all connected components is obtained, and this rectangle is used as the input for the subsequent AI human detection system.

[0086] 【Step S4】Moving object detection step: Based on the boundary marker generated in the differentiation marking step, identify the image within the boundary marker to obtain the position of the target object.

[0087] Figure 3 This is a flowchart of a moving object detection method according to another embodiment of the present invention, wherein the detection method uses the frame difference method. The frame difference method is a detection method based on the strong correlation between two adjacent frames in a motion image sequence. With the camera fixed, pixel-based temporal difference is applied to two adjacent frames in a continuous image sequence, and stationary objects are removed by thresholding to extract the moving regions in the image. The method includes the following steps:

[0088] Step 1: Read in the current image of the video sequence, and use image preprocessing methods to binarize the RGB color image into a grayscale image. The corresponding relationship is as follows:

[0089] Gray=0.299*Red+0.587*Green+0.114*Blue;

[0090] Step 2: Determine if the current frame is the first frame. If it is the first frame, store the processed data and return to the previous step. If it is not the first frame, perform a difference operation with the previous frame. Calculate the difference between the corresponding pixels of two adjacent frames and determine the absolute value of the grayscale value. Pixels with an absolute value greater than a threshold T are considered foreground; otherwise, they are considered background. The corresponding formula is as follows:

[0091]

[0092] Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result.

[0093] Binarize the image F n-1 and F n The calculation is performed to extract the pixel position of the moving target.

[0094] Step 3: Morphological processing of the binarized image using salient point analysis, Gaussian filtering, and image dilation. Morphological processing is performed on all pixels in the image. First, salient point analysis is used to determine if the current pixel's four neighbors have values. Filtering is only performed if values are found. The Gaussian filter template used here has coefficients that are powers of 2. Gaussian filtering calculations can be completed using bitwise operations. Finally, image dilation is performed on the current pixel to complete the morphological processing for that point.

[0095] Step 4: Perform connected component analysis on the image to find bounding boxes containing complete moving targets. For a binary image, sequentially traverse all pixels. If a point is foreground, assign its neighboring pixels within a specific range to the same category, record a bounding box for this category, and record the bounding box coordinates. Therefore, pixels within the same bounding box belong to the same connected component. Subsequent traversal of the binary image will not generate a new category for a point if a foreground point already has a certain category. For example... Figure 4 As shown, using a boundary range of 5, black dots as foreground points, and gray dots as background points, three rectangles (red, yellow, and green) are generated. However, some connected components overlap. Therefore, after traversing all pixels in the binary image, non-maximum suppression is performed to remove bounding boxes with an Intersection over Union (IoU) greater than a threshold. Within these bounding boxes, if their connected components are greater than a given threshold, a moving object is considered detected. Finally, based on the coordinates of the matching bounding boxes, a rectangle containing all connected components is obtained, and this rectangle is used as the input for the subsequent AI human detection system.

[0096] This invention also discloses an electronic device, Figure 7 This is a schematic diagram of the composition of an electronic device according to an embodiment of the present invention; please refer to [link / reference]. Figure 7 At the hardware level, the electronic device includes a memory, a processor, and at least one network interface; the processor may be a microprocessor, and the memory may include main memory, such as random access memory (RAM) or non-volatile memory. Of course, the electronic device may also include other hardware as needed.

[0097] The processor, network interface, and memory are interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. The bus may include an address bus, a data bus, a control bus, etc. The memory stores programs (including operating system programs and application programs); the programs may include program code, which may include computer operation instructions. The memory may include main memory and non-volatile memory, and provides instructions and data to the processor.

[0098] In one embodiment, the processor can read the corresponding program from non-volatile memory into memory and then run it; the processor can execute the program stored in memory and specifically perform the following operations (e.g. Figure 2 As shown):

[0099] 【Step S1】Image acquisition step; acquire image data of the designated area;

[0100] 【Step S2】Image difference recognition step; Identify the image differences at different time points in the image acquisition step;

[0101] 【Step S3】Region marking step; When a screen difference is identified in the screen difference recognition step, a boundary marker is generated, and the boundary marker contains the screen difference portion that meets the set conditions;

[0102] 【Step S4】Moving object detection step: Based on the boundary marker generated in the differentiation marking step, identify the image within the boundary marker to obtain the position of the target object.

[0103] This invention further discloses a storage medium storing computer program instructions, which, when executed by a processor, implement the following steps of the method of this invention (e.g. Figure 2 As shown):

[0104] 【Step S1】Image acquisition step; acquire image data of the designated area;

[0105] 【Step S2】Image difference recognition step; Identify the image differences at different time points in the image acquisition step;

[0106] 【Step S3】Region marking step; When a screen difference is identified in the screen difference recognition step, a boundary marker is generated, and the boundary marker contains the screen difference portion that meets the set conditions;

[0107] 【Step S4】Moving object detection step: Based on the boundary marker generated in the differentiation marking step, identify the image within the boundary marker to obtain the position of the target object.

[0108] In summary, the beneficial effects of the present invention are as follows: the AI object detection system, method, electronic device and storage medium based on lightweight motion detection proposed in the present invention can improve the speed and accuracy of motion detection.

[0109] The Moving Object Detector of this invention offers the following advantages:

[0110] This invention can accurately locate moving objects in an image, and remove the background area from the image, leaving only the foreground salient area as the input for human detection. Because the human detector only focuses on moving objects in the image, it can more accurately identify the position of human figures in the image.

[0111] This invention focuses only on salient regions, thus increasing the detection range; this invention can reduce interference from background obfuscation; and the optimized morphological processing and connectivity analysis used in the frame difference method can improve the inference speed.

[0112] It should be noted that this application can be implemented in software and / or a combination of software and hardware; for example, it can be implemented using an application-specific integrated circuit (ASIC), a general-purpose computer, or any other similar hardware device. In some embodiments, the software program of this application can be executed by a processor to implement the steps or functions described above. Similarly, the software program of this application (including related data structures) can be stored in a computer-readable recording medium; for example, RAM memory, magnetic or optical drives, floppy disks, and similar devices. In addition, some steps or functions of this application can be implemented in hardware; for example, as circuitry that cooperates with a processor to perform the various steps or functions.

[0113] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0114] The description and application of the present invention herein are illustrative and not intended to limit the scope of the invention to the embodiments described above. Effects or advantages involved in the embodiments may not be apparent due to various factors, and the description of effects or advantages is not intended to limit the embodiments. Variations and modifications of the embodiments disclosed herein are possible, and various substitutions and equivalents of the components in the embodiments are well known to those skilled in the art. It should be apparent to those skilled in the art that the invention can be implemented in other forms, structures, arrangements, proportions, and with other components, materials, and parts without departing from the spirit or essential characteristics of the invention. Other variations and modifications can be made to the embodiments disclosed herein without departing from the scope and spirit of the invention.

Claims

1. An AI object detection system based on lightweight motion detection, characterized in that, The AI object detection system includes: The image acquisition module is used to acquire the image of a designated area; The image difference recognition module is used to identify the differences in images acquired by the image acquisition module at different set time points; A region marking module is used to generate boundary markers when the image difference recognition module detects an image difference. The boundary markers contain the image difference portions that meet set conditions. The moving object detection module is used to identify the image within the boundary markers generated by the region marking module, locate the moving object in the image, and remove the background area in the image, leaving only the foreground salient area as the detection input to obtain the position of the target object. The image difference recognition module includes: The image processing unit is used to binarize the RGB color image and convert it into a grayscale image. The difference operation unit is used to calculate the difference between corresponding pixels in two adjacent frames and determine the absolute value of the grayscale value; pixels with an absolute value greater than a threshold T are considered foreground, otherwise they are considered background, as shown in the following formula: ； Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result; The region marking module includes: The pixel position acquisition unit of the moving target is used to perform operations on the binarized images Fn-1 and Fn to obtain the pixel position of the moving target; The binarized image morphological processing unit is used to perform morphological processing on binarized images, using salient point analysis, Gaussian filtering, and image dilation. Morphological processing is performed on all pixels in the image: First, salient point analysis is used to determine if the current pixel's four neighbors have values. Filtering is only performed if values are found. Finally, image dilation is applied to the current pixel to complete the morphological processing. Gaussian filter coefficients are incorporated into the binarization threshold, reducing the number of division operations from the total number of pixels to a single division operation. After weighted operations in the Gaussian filter, pixels are directly determined by the threshold to be either 255 or 0. If a pixel has a value, it is dilated by a factor of n. The Gaussian filter has the highest weight for neighboring pixels; therefore, if the binarized values of neighboring pixels are all 0, the weighted value can be expected to be 0, and the Gaussian filter operation for that pixel is skipped, significantly reducing computation time. The connected component analysis unit performs connected component analysis on the image to find contour boxes containing complete moving targets. For a binary image, it sequentially traverses all pixels. If a point is foreground, its neighboring pixels within a specific range are assigned to the same category, and a bounding box is assigned to this category, with the bounding box coordinates recorded. Pixels within the same bounding box belong to the same connected component. When traversing the binary image later, if a foreground point already has a certain category, no new category is generated for that point. After traversing all pixels in the binary image, non-maximum suppression is performed, and bounding boxes larger than a threshold are deleted. If the connected components of these bounding boxes are larger than a given threshold, a moving object is considered detected. Based on the coordinates of the matching bounding boxes, a rectangle that can contain all connected components is obtained, and this rectangle is used as the input for the subsequent AI human detection device.

2. An AI object detection method based on lightweight motion detection, characterized in that, The AI object detection method includes: Image acquisition steps; Acquire image data for a specified area; Image difference recognition step; recognizing image differences at different time points during the image acquisition step; The region marking step; when a scene difference is detected in the scene difference recognition step, a boundary marker is generated, the boundary marker containing the scene difference portion that meets the set conditions; and Moving object detection step: Based on the boundary markers generated in the area marking step, the image within the boundary markers is identified, the moving object in the image is located, the background area in the image is removed and only the foreground salient area is left as the detection input, and the position of the target object is obtained; The image difference recognition steps include: Image processing steps: Binarize the RGB color image to convert it into a grayscale image; The differential calculation steps are as follows: Calculate the difference between the corresponding pixels in two adjacent frames, and determine the absolute value of the grayscale value; if the absolute value is greater than the threshold T, the pixel is considered foreground; otherwise, it is considered background. The corresponding formula is as follows: ； Where Fn-1(x,y) is the total number of pixels in the (n-1)th frame, Fn(x,y) is the total number of pixels in the nth frame, x and y refer to the x and y coordinates of the pixel, respectively; T is the set threshold, and D(x,y) is the calculated binary result; The region marking step includes: Steps for obtaining the pixel position of a moving target: Perform calculations on the binarized images Fn-1 and Fn to obtain the pixel position of the moving target; Binarized image morphological processing steps: Morphological processing of the binarized image is performed using salient point analysis, Gaussian filtering, and image dilation; The process involves traversing all pixels in the image for morphological processing. First, salient point analysis is used to determine if the current pixel's four neighbors have values. Only if values are found are filtered. Finally, image dilation is performed on the current pixel to complete the morphological processing. Gaussian filter coefficients are incorporated into the binarization threshold, reducing the number of division operations from the total number of pixels to a single operation. Pixels after weighted Gaussian filtering are directly determined by the threshold to be either 255 or 0. If a pixel has a value, it is dilated by a factor of n. The Gaussian filter has the highest weight for neighboring pixels; therefore, if the binarized values of neighboring pixels are all 0, the weighted value is expected to be 0, and the Gaussian filter operation for that pixel is skipped, significantly reducing computation time. The connected component analysis process involves: performing connected component analysis on the image to find bounding boxes containing complete moving targets; for a binary image, sequentially traversing all pixels, if a point is foreground, assigning its neighboring pixels within a specific range to the same category, recording the bounding box coordinates, thus pixels within the same bounding box belong to the same connected component; when traversing the binary image, if a foreground point already has a category, no new category will be generated for that point; after traversing all pixels in the binary image, performing non-maximum suppression, deleting bounding boxes larger than a threshold; when the connected components of these bounding boxes are larger than a given threshold, a moving object is considered detected; based on the coordinates of the matching bounding boxes, obtaining a rectangle that contains all connected components, and using this rectangle as input for the subsequent AI human detection system.

3. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method of claim 2.

4. A storage medium storing computer program instructions thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the steps of the method of claim 2.