A method for object detection using cropped images

The method enhances object detection in video streams by using image cropping and recursive neural network analysis to accurately identify objects for masking, addressing accuracy and real-time operation challenges.

JP7880842B2Active Publication Date: 2026-06-26AXIS

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
AXIS
Filing Date
2023-06-14
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing object detection methods in video streams suffer from reduced accuracy due to downscaling for neural networks, leading to potential exposure of masked objects when digitally zoomed in.

Method used

A method involving image cropping and recursive analysis using multiple neural networks to enhance object detection accuracy, where regions with uncertain detection are further cropped and analyzed until a predetermined condition is met.

Benefits of technology

Improves object detection accuracy by ensuring precise identification of objects for masking, reducing the need for digital zoom and maintaining real-time operation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007880842000001
    Figure 0007880842000001
  • Figure 0007880842000002
    Figure 0007880842000002
  • Figure 0007880842000003
    Figure 0007880842000003
Patent Text Reader

Abstract

To provide an improved method for object detection in an image, and a control unit.SOLUTION: A method includes the steps of: acquiring a first resolution image collected by an image collection device; scaling the first resolution image to a second resolution image having resolution lower than the resolution of the first resolution image; analyzing the second resolution image in order to determine a first probability that a detected object of a prescribed type exists in the area of the first resolution image; cropping an area including the detected object in the first resolution image in the case that the first probability is lower than a first threshold and that it is higher than or equal to a second threshold; analyzing the cropped image in order to determine a second probability that the detected object is a prescribed type; and giving an instruction that the detected object is the prescribed type in the case that the second probability is higher than or equal to the first threshold.SELECTED DRAWING: Figure 2
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention generally relates to the field of camera surveillance, and more particularly, to a method and a control unit for object detection in a video stream captured using an image acquisition device, for example for the purpose of anonymizing objects in the video stream.

Background Art

[0002] In various camera surveillance applications, it may be necessary to mask objects in a video stream captured by a camera. An important reason for object masking is to guarantee the privacy of people present in the video stream and to protect other types of personal information that may be captured in the video stream.

[0003] Masking of people can be performed by extracting the image coordinates of the relevant parts of the image frame. Once the image coordinates are known, the relevant parts in the video stream can be masked, pixelated, blurred, or modified in other ways to obscure the identifiers in the video stream. However, before masking can be processed, the objects to be masked must be detected in the video stream. Object detection methods can be applied, for example, to detect people.

[0004] Object detection requires an algorithm, such as being performed by a neural network, to detect and classify objects in a video stream. In order to enable the neural network to operate fast enough, the neural network may have to operate at a lower resolution than the camera. This requires downscaling of the image, resulting in data loss. Furthermore, the reduction in resolution may make it impossible for the neural network to accurately detect the objects that should be masked, which means that as a result, an operator may be able to view the objects by digitally zooming in on the image.

[0005] Therefore, there is room for improvement in object detection, particularly for anonymizing objects in video streams. [Overview of the Initiative]

[0006] In view of the aforementioned and other drawbacks of the prior art, an object of the present invention is to provide an improved method for detecting objects in an image that mitigates at least some of the drawbacks of the prior art.

[0007] According to a first aspect of the present invention, a method for detecting an object in an image is provided. The method includes the steps of: acquiring a first resolution image collected by an image acquisition device; scaling the first resolution image to a second resolution image having a lower resolution than the resolution of the first resolution image; analyzing the second resolution image to determine a first probability that a detected object of a predetermined type is present in a region of the first resolution image; cropping a region in the first resolution image containing the detected object if the first probability is below a first threshold and above or equal to a second threshold, wherein a probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and a probability below the second threshold indicates that the detected object is not of a predetermined type; analyzing the cropped image to determine a second probability that the detected object is of a predetermined type; and indicating that the detected object is of a predetermined type if the second probability is above or equal to the first threshold.

[0008] This invention is based on the recognition that image cropping can be used to improve the accuracy of object detection. More specifically, the inventors have found that when the first probability is such that it is impossible to know whether a region in a downscaled or scaled image contains an object of a given type, the region in the original image collected by the image acquisition device can be cropped, and the cropped image can be analyzed, or more specifically, passed back into a neural network for object detection. This results in more accurate object detection.

[0009] A video stream is generally a set of consecutive image frames captured over time. These consecutive image frames collectively form a video stream.

[0010] The present invention can be applied to various applications in which object detection is used. One such application is for privacy masking, which is understood as a function to protect the privacy of an individual by hiding or concealing a portion of an image frame that has a masked area. Generally, privacy masks can be static or dynamic, but the privacy masks most often described herein are dynamic masks. Examples of privacy masks include edge filters, solid masks, or blurs. Static masks can be applied uniformly across the entire or at least a portion of an image frame, while dynamic masks can be applied, for example, when a face or person is detected in a video stream.

[0011] Image scaling can be thought of as downscaling aimed at reducing the image resolution to a resolution suitable for the analysis step of object detection. For example, a first image may be downscaled to a second resolution suitable for the algorithm or neural network used for object detection.

[0012] According to one embodiment, the steps of analyzing a second resolution image and analyzing a cropped image can be performed within a neural network. The neural network can generally operate at a lower resolution than the sensor. The scaling step involves adapting the image resolution to the neural network's operating image resolution. In other words, the neural network can be adapted to the image resolution and size, anticipating that the input image will have that resolution and size. This is necessary to enable the neural network to operate in real time for object detection.

[0013] According to one embodiment, the step of analyzing a second resolution image may be performed in a first neural network, and the step of analyzing a cropped image may be performed in a second neural network. By using two or more neural networks, it is advantageous that the object detection method can be performed more quickly.

[0014] According to one embodiment, the method includes (a) analyzing a cropped image to determine a further probability that a certain type of detected object is present in the region of the cropped image, and (b) if the further probability falls below a first threshold, and If the second threshold is exceeded or equal to it, the steps are to (c) crop the region of the detected object in the cropped image in order to form a further cropped image, and (c) the detected object is of a predetermined type. And yet another To determine the probability, the steps include (d) analyzing further cropped images. And yet another The steps may include a step in which, if the probability exceeds or is equal to a first threshold, an indication is given that the detected object is of a predetermined type, and steps (a) to (d) are performed recursively until a predetermined condition is met. In this way, if the analysis still cannot conclude with confidence the type of object, further image cropping is performed on a smaller area. Recursive analysis and cropping further improve the accuracy of object detection.

[0015] The recursive execution method may continue until at least one of several conditions is met: namely, for a predetermined number of iterations, the further probability in step (b) falls below a second threshold or exceeds or is equal to the first threshold, and the resolution of the further cropped image falls below a predetermined resolution. The predetermined resolution could be, for example, the resolution to which the neural network used to analyze the cropped image is adapted. Furthermore, if the further probability in step (b) falls below a second threshold or exceeds or is equal to the first threshold, there is no "detection uncertainty," and therefore, there is no need to continue with further recursive iterations.

[0016] According to one embodiment, the step of analyzing a first resolution image may include determining the probability that a certain type of detected object is present in two or more regions, each of which is cropped and analyzed in the first resolution image. In this way, it is assumed that two or more regions are cropped to detect two or more objects.

[0017] According to one embodiment, the method may include the steps of: analyzing a second set of resolution images; detecting motion in the second set of resolution images; cropping a further area in subsequent image frames if the detected motion exceeds or is equal to a motion threshold; analyzing the cropped image; and indicating that the detected object is of a predetermined type if a second probability exceeds or is equal to a first threshold. Advantageously, the new crop is performed only on areas where motion is detected by a certain magnitude exceeding or equal to the motion threshold, thereby reducing the time required to take the new crop. In other words, the new crop is taken only where motion is detected. The further area preferably includes the area of ​​the moving detected object detected in the second set of resolution images. To describe one possible implementation, detection and quantification of motion may be performed by monitoring pixel color changes in the image.

[0018] Furthermore, if the detected motion falls below the motion threshold in the first image frame, the results of the analysis step in the first frame can be reused on the same cropped region in subsequent image frames. In this way, instead of performing a new crop and re-analyzing regions where a new crop may not be necessary, i.e., regions where no motion is detected, the previous analysis results are reused.

[0019] According to one embodiment, if a first probability exceeds or is equal to a first threshold, an indication is given that the detected object is of a predetermined type. The indication may further include an indication that object masking is required.

[0020] Preferably, the resolution of the first resolution image is the resolution at the time of acquisition by the image acquisition device.

[0021] The resolution of the second resolution image favorably depends on the size of the neural network used to analyze the second resolution image.

[0022] Preferably, the method is executed at a rate that substantially corresponds to the frame rate of the captured video stream including the first resolution image.

[0023] According to a second aspect of the present invention, a control unit for object detection in an image is provided. The control unit acquires a first resolution image collected by an image collection device, scales the first resolution image to a second resolution image having a resolution lower than that of the first resolution image, analyzes the second resolution image to determine a first probability that a detected object of a predetermined type exists in a region of the first resolution image, and when the first probability is below a first threshold and above or equal to a second threshold, crops a region including the detected object in the first resolution image, where a probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and a probability below the second threshold indicates that the detected object is not of a predetermined type, crops the region, analyzes the cropped image to determine a second probability that the detected object is of a predetermined type, and when the second probability is above or equal to the first threshold, gives an indication that the detected object is of a predetermined type.

[0024] Further embodiments of this second aspect of the present invention, and the effects obtained by this second aspect, are very similar to the embodiments and effects described above for the first aspect of the present invention.

[0025] According to a third aspect of the present invention, a system is provided that includes a control unit according to the second aspect and an image collection device for capturing an image of a scene including an object.

[0026] The image collection device is preferably a video camera, such as a surveillance camera.

[0027] A further embodiment of this third aspect of the present invention, and the effects obtained by this third aspect, are largely similar to the embodiments and effects described above for the first and second aspects of the present invention.

[0028] According to a fourth aspect of the present invention, there is provided a computer program for object detection in an image, which, when executed on the processing circuit of a controller, causes the control unit to: obtain a first-resolution image collected by an image collection device; scale the first-resolution image to a second-resolution image having a resolution lower than that of the first-resolution image; analyze the second-resolution image to determine a first probability that a detected object of a predetermined type exists in a region of the first-resolution image; crop a region including the detected object in the first-resolution image when the first probability is below a first threshold and above or equal to a second threshold, where a probability above or equal to the first threshold indicates that the detected object is of the predetermined type, and a probability below the second threshold indicates that the detected object is not of the predetermined type; analyze the cropped image to determine a second probability that the detected object is of the predetermined type; and give an indication that the detected object is of the predetermined type when the second probability is above or equal to the first threshold. The computer program includes computer code for performing these operations.

[0029] A further embodiment of this fourth aspect of the present invention, and the effects obtained by this fourth aspect, are largely similar to the embodiments and effects described above for other aspects of the present invention.

[0030] Further features and advantages of the present invention will become apparent upon study of the appended claims and the following description. Those skilled in the art will recognize that different features of the present invention can be combined to create embodiments other than those described below without departing from the scope of the present invention.

[0031] Various aspects of the present invention, including specific features and advantages of the present invention, will be readily apparent from the following detailed description and accompanying drawings. [Brief explanation of the drawing]

[0032] [Figure 1] This figure conceptually illustrates exemplary applications of embodiments of the present invention. [Figure 2] This is a flowchart of the method steps according to an embodiment of the present invention. [Figure 3] This figure conceptually illustrates an image cropping process for object detection according to an embodiment of the present invention. [Figure 4A] This figure conceptually illustrates a control unit for operating a neural network according to an embodiment of the present invention. [Figure 4B] This diagram conceptually illustrates a control unit for operating two neural networks according to an embodiment of the present invention. [Figure 5] This figure conceptually illustrates an image cropping process for object detection according to an embodiment of the present invention. [Figure 6] This is a flowchart of the method steps according to an embodiment of the present invention. [Figure 7] This figure conceptually illustrates image cropping in the presence of a moving object, according to an embodiment of the present invention. [Figure 8] This is a flowchart of the method steps according to an embodiment of the present invention. [Modes for carrying out the invention]

[0033] Next, the present invention will be described more thoroughly below with reference to the accompanying drawings illustrating currently preferred embodiments of the invention. The present invention may, however, be carried out in many different forms and should not be construed as being limited to the embodiments described herein, rather these embodiments are given for thoroughness and completeness and will adequately convey the scope of the invention to those skilled in the art. Throughout, similar reference numerals refer to similar elements.

[0034] Next, looking at the drawings, particularly Figure 1, we see a scene 1 being monitored by an image acquisition device 100, for example, a camera, or more specifically, a surveillance camera. In scene 1, there is a set of objects 104a-d, which could be a vehicle 104d in a parking lot, and several people 104a-c.

[0035] Camera 100 may be mounted on a building, on a pole, or in any other suitable location, depending on the specific application at hand. Furthermore, camera 100 may be a fixed camera, a movable camera such as a pan, tilt, and zoom camera, or even a mountable camera. Furthermore, camera 100 may be a visible light camera, an infrared (IR) sensing camera, or a thermal (long-wavelength infrared (LWIR)) camera. Additionally, an image acquisition device employing LiDAR and radar functions may also be conceivable.

[0036] Camera 100 continuously monitors Scene 1 by capturing image frames that form a video stream of Scene 1. Scene 1, which is within the field of view of Camera 100, is exemplified here as including a vehicle 104d and people 104a-c. Camera 100 may transmit the video stream to a client 116 or server 118 via a communication network 114, for example, a wired or wireless communication link 112 connected to the cloud.

[0037] The camera 100 comprises an image acquisition module 2 and a control unit 3 having one or more processors capable of operating an image processing pipeline and an encoder. The camera 100 further comprises an input / output interface 210 configured as a communication interface between the camera 100 and the network 114 via a wireless link 112.

[0038] During monitoring of Scene 1 using camera 100, control unit 3 is operable to detect objects in the scene and classify them as belonging to a certain type. Detecting objects in Scene 1 is important because it can be used for masking those objects and people in order to protect the privacy of those objects and people. To improve the ability of the camera and its associated control unit 3 to detect a given type of object, for example, an object that requires masking, the following methods are provided.

[0039] Figure 2 is a flowchart of the method steps according to an embodiment of the present invention, and Figure 3 conceptually illustrates the steps of this method.

[0040] In step S102, a first resolution image 110 is acquired by the image acquisition device 100. The first resolution image 110 may be a frame from a video stream, in which case it includes a vehicle 104d and people 104a-c. The resolution of the first image is the resolution at the time it is captured by the image acquisition device 100.

[0041] In step S104, the control unit 3 scales the first resolution image 110 to a second resolution image 112 having a lower resolution than the first resolution image 110. Scaling the first resolution image 110 to the second resolution image 112 can be done by downscaling the first resolution image 110 to the second resolution image 112. The resolution of the second resolution image 112 can be scaled down to a magnification of 1 / 25 selected within the range of 4 to 25, such as 4, 9, or 16, where the total number of pixels is 4, 9, or 16.

[0042] In step S106, the second resolution image is analyzed by the control unit 3 to determine a first probability that a detected object of a predetermined type exists in region 111 of the first resolution image. In this way, the second resolution image 112 is analyzed to establish region 111' in the second resolution image 112 in which the detected object may exist. The corresponding region 111 is determined or found in the original first resolution image 110.

[0043] The analysis is preferably performed within a neural network of a predetermined size and resolution. The size and / or resolution of the neural network corresponds to the size and resolution of the image on which the neural network was trained. Furthermore, the resolution of the second image 112 depends on the size and / or resolution of the neural network used to analyze the second resolution image. For example, the second resolution should be kept higher than the resolution of the neural network. Since the size of the first resolution image is substantially larger than the size of the neural network, the neural network may have difficulty analyzing the image fast enough. To mitigate this, according to this disclosure, the first resolution image is downscaled to a smaller resolution, thereby enabling the neural network to operate faster to perform the analysis. In this way, the overall operation of the analysis can also be performed more efficiently by performing the downscaling and subsequent cropping described above.

[0044] If the first probability is below the first threshold and above or equal to the second threshold, in step S108, the region 111 containing the detected objects 104b-c in the first resolution image is cropped. The first and second probabilities are adapted such that a probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and a probability below the second threshold indicates that the detected object is not of a predetermined type. For example, objects of a predetermined type may include objects to be masked, such as people 104a-c and license plates on vehicles 104d.

[0045] In some cases, the cropped image 111 may be downscaled, for example, if a recursive process is applied as described further below, similar to the downscaling in step S104. However, generally, it is not strictly necessary for the cropped image to be downscaled.

[0046] If the first probability exceeds or is equal to the first threshold, in step S109, the control unit 3 gives an indication that the detected object is of a predetermined type. When this method is applied to masking, if the detected object is an object that should be masked, such as a person or a vehicle's license plate, the subsequent step may be to perform privacy masking on the detected objects 104c~b in image 110.

[0047] In the subsequent step S110, the cropped image 111 is analyzed by the control unit 3 to determine a second probability that the detected object 104b is of a predetermined type. If the second probability exceeds or is equal to the first threshold, in step S112, the control unit 3 gives an indication that the detected object 104b is of a predetermined type.

[0048] As schematically shown in Figure 4A, the control unit 3, which receives or collects a video stream 4 from the image acquisition module 2, scales the first resolution image 110 to be analyzed and provides the scaled second resolution image 112 to the neural network 302. The steps of analyzing the second resolution image 112 and analyzing the cropped image 111 are performed in the neural network 302. The control unit 3 provides an instruction in the form of a data signal 304 that the detected object 104b is of a predetermined type.

[0049] Figure 4B conceptually shows an alternative to the embodiment in Figure 4A. In this alternative, the step of analyzing the second resolution image 112 is performed in the first neural network 302a, and the step of analyzing the cropped image 111 is performed in the second neural network 302b. In this way, the processing power is divided between the two networks 302a and 302b, so that the second resolution image can have a higher resolution than when only a single neural network is used.

[0050] Referring to Figure 5, based on the analysis of the second resolution image 112 in which corresponding regions 111a' and 111b' are detected, it is possible that objects 104c and 104b of a given type may be detected with a certain probability in two or more regions 111a and 111b of the first resolution image 110. In such cases, each of the two or more regions 111a and 111b is cropped in the first resolution image 110 and analyzed in a neural network or in two or more neural networks. However, where possible, as will be further described below, a single region containing both objects 104c and 104b may be cropped and processed in a recursive manner.

[0051] According to some embodiments, the method is performed recursively, as illustrated by the steps of the flowchart in Figure 6, which is described in conjunction with Figure 3. The cropping in step S108 is preferably performed so that the cropped image includes all or as many of the detected objects. In this example, the cropped image 111 includes the detected objects 104b and 104c. If the image size of the cropped image 111 is too large for the neural network, the cropped image 111 is downscaled to a reduced-resolution cropped image before analysis. Thus, given the cropped image 111, it can be downscaled firstly as described in step S104. Then, in step S602, the cropped image 111 or the reduced-resolution cropped image is analyzed to determine the second or further probability that a certain type of detected object 104c is present in the region 115' of the cropped image 111. In this way, similar to the second resolution image 112 in step S106 of Figure 2, the cropped image 111, which may be a scaled cropped image 111, is analyzed in the neural network. In this way, the analysis is performed on the cropped image 111 to establish the region 115' in the cropped image 111 where the detected object 104c may be located.

[0052] If the further probability falls below the first threshold and exceeds or is equal to the second threshold, in step S604, the region 115' of the detected object 104c in the cropped image 111 is cropped to form a further cropped image 115.

[0053] If the predetermined conditions are not met, the process in step S602 determines that the detected object is of a predetermined type. And yet anotherTo determine the probability, the further cropped image is recursively analyzed again. If the probability exceeds or is equal to the first threshold, in step S109, an indication is given that the detected object 104c is of a predetermined type. In such a case, the detected object 104c may be masked in the first resolution image 110.

[0054] Object 104b may be concluded to be of a predetermined type in the cropped image, and therefore object 104b is not included in the cropped image 115.

[0055] The steps in the flowchart of Figure 6 are performed recursively until at least one of the following predetermined conditions is met: the probability of further cropping falls below the second threshold or exceeds or is equal to the first threshold; the number of iterations exceeds or is equal to a predetermined number; and the resolution of the further cropped image falls below a predetermined resolution.

[0056] For example, if the further probability exceeds or is equal to the first threshold, the neural network concludes that object 104c is detected with a sufficiently high probability and therefore no further analysis is needed, and gives instructions for this in step S109. Similarly, if the further probability falls below the second threshold, the neural network concludes that the object is not of a given type and therefore no further analysis is needed. In this case, for example, it can be concluded that masking is not necessary.

[0057] Furthermore, the iteration may continue until the resolution of the further cropped image falls below a predetermined resolution set by the neural network's resolution.

[0058] If the specified conditions are met, this method terminates in step S610.

[0059] Figure 7 conceptually shows frames 110a and 110b of two first resolution images and the corresponding scaled second resolution images 112a and 112b, and Figure 8 is a flowchart of the method steps according to an embodiment of the present invention.

[0060] As explained with respect to Figure 2, in step S106, the second resolution image 112a and the second resolution image 112b are analyzed.

[0061] Subsequently, step S802 includes detecting motion in the second resolution images 112a-b. This can be done by comparing frame 112a of the second resolution image with frame 112b of the subsequent second resolution image and detecting differences between those frames that may indicate the motion of an object. For example, a pixel-level comparison may be performed to compare color changes between frame 112a and frame 112b that may indicate motion. Here, the vehicle 104d is moving, for example, between frame 110a / 112a and frame 110b / 112b, but the person 104a is not moving.

[0062] If the detected motion exceeds or is equal to the motion threshold, in step S804, the same region in the subsequent image frame is cropped. In this example, since motion was detected in the cropped region 111d' in frame 112a of the second resolution image, a further region 111e containing the moving object 104d is cropped in the second image frame 110b, as in the case of the first image frame 110a. The further region 111e corresponds to the region 111e' of the moving object 104d in the second resolution image 112b of the second image frame 110b. Therefore, since a relatively strong degree of motion has been detected, it is important to repeat step S110, in which further regions 111e are successively cropped and the cropped region 111e is analyzed.

[0063] However, if the detected motion falls below the motion threshold in the first image frame 112a, repeating the cropping is unnecessary. Instead, the results of the analysis step in the first frame 112a on the same cropped region are reused in the subsequent image frame 112b. This saves computational power because the cropping step S108 and the analysis step S110 do not need to be repeated unless a change is detected in the corresponding region of Scene 1. For example, no motion is detected in region 111c' between frame 112a and frame 112b of the second resolution image. Therefore, the analysis to detect an object performed in the previous step on the corresponding region 111c in image frame 110 is reused, for example, concluding that person 104a is still present in image frame 110b.

[0064] Preferably, the method is performed at a rate substantially corresponding to the frame rate of the captured video stream containing the first resolution image. The neural network, or network, is configured such that object detection can occur on each frame with a delay of at most one frame.

[0065] The method described herein is a computer implementation method.

[0066] The control unit includes a processing circuit, which is configured to cause the control unit to perform steps of the method described herein.

[0067] Furthermore, a computer program product is provided which includes a computer-readable storage medium for storing computer programs. The computer-readable storage medium may be non-temporary, for example, and may be provided as a hard disk drive (HDD), solid-state drive (SDD), USB flash drive, SD card, CD / DVD, and / or any other storage medium capable of non-temporary storage of data.

[0068] The computer program includes computer code, which, when executed on the controller's processing circuit, causes the control unit to: acquire a first resolution image collected by an image acquisition device; scale the first resolution image to a second resolution image having a lower resolution than the first resolution image; analyze the second resolution image to determine a first probability that a detected object of a predetermined type is present in the region of the first resolution image; crop the region containing the detected object in the first resolution image if the first probability is below a first threshold and above or equal to a second threshold, wherein a probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and a probability below the second threshold indicates that the detected object is not of a predetermined type; analyze the cropped image to determine a second probability that the detected object is of a predetermined type; and if the second probability is above or equal to the first threshold, provide an indication that the detected object is of a predetermined type.

[0069] The control unit includes a microprocessor, a microcontrol unit, a programmable digital signal processor, or another programmable device. The control unit may also, or instead, include an application-specific integrated circuit, a programmable gate array or programmable array logic, a programmable logic device, or a digital signal processor. If the control unit includes a programmable device, such as the microprocessor, microcontrol unit, or programmable digital signal processor described above, the processor may further include computer executable code that controls the operation of the programmable device.

[0070] The control functions of the Disclosure may be implemented using an existing computer processor, by a dedicated computer processor for a suitable system incorporated for this purpose or another purpose, or by a hardwired system. Embodiments within the scope of the Disclosure include a program product comprising a machine-readable medium for carrying machine-executable instructions or data structures, or storing machine-executable instructions or data structures thereon. Such a machine-readable medium may be any available medium that can be accessed by a general-purpose or dedicated computer, or other machine having a processor. For example, such a machine-readable medium may comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of machine-executable instructions or data structures, and can be accessed by a general-purpose or dedicated computer, or other machine having a processor. When information is transferred to or given to a machine via a network or another communication connection (either wired, wireless, or a combination of wired and wireless), the machine appropriately considers the connection to be a machine-readable medium. Thus, any such connection is appropriately referred to as a machine-readable medium. The above combinations are also included within the scope of a machine-readable medium. A machine-executable instruction includes, for example, instructions and data that cause a general-purpose computer, a dedicated computer, or a dedicated processing machine to perform a certain function or a group of functions.

[0071] While diagrams may show sequences, the order of steps may differ from those shown. Furthermore, two or more steps may be executed simultaneously or partially simultaneously. Such variations depend on the selected software and hardware systems and the designer's selection. All such variations fall within the scope of this disclosure. Similarly, software implementations can be achieved using standard programming techniques having rule-based logic and other logic to accomplish various connection, processing, comparison, and decision steps. Moreover, while the invention has been described with reference to specific exemplary embodiments thereof, many different changes, modifications, etc., will become apparent to those skilled in the art.

[0072] Furthermore, variations to the disclosed embodiments can be understood and implemented by those skilled in the art in carrying out the claimed invention, based on the study of the drawings, this disclosure, and the appended claims. Moreover, in the claims, the word “including” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude the plural. [Explanation of Symbols]

[0073] 2 Image acquisition module 3. Control Unit 4 video streams 10 Systems 100 Image Acquisition Devices 104a Object 104b Object 104c Object 104d object 110 First resolution image 110a Frame of the first resolution image 110b Frame of the first resolution image 111 areas 111 Cropped image 111' area 111a area 111a' area 111b area 111b' area 111c area 111c' area 111d area 111d' area 111e area 111e' area 112 Communication Link 112 Second resolution image 112a Frame of the second resolution image 112b Frame of the second resolution image 114 Communication Networks 115 Cropped images 115' area 116 clients 118 servers 210 Input / Output Interfaces 302 Neural Networks 302a First Neural Network 302b The second neural network 304 Data signal

Claims

1. A computer-based method for detecting an object in an image collected by a camera of a surveillance system, The steps include acquiring a first resolution image collected by the aforementioned camera, The steps include scaling the first resolution image to a second resolution image having a lower resolution than the first resolution image, A step of analyzing a second resolution image in order to determine a first probability that a detected object of a predetermined type is present in the region of the first resolution image, wherein the second resolution image is analyzed such that the predetermined type is an object to be masked. If the first probability is below a first threshold and above or equal to a second threshold, the step of cropping the region in the first resolution image that includes the detected object, wherein the probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and the probability below the second threshold indicates that the detected object is not of a predetermined type; The steps include analyzing the cropped image to determine a second probability that the detected object is of the predetermined type, If the second probability exceeds or is equal to the first threshold, the step of indicating that the detected object is of the predetermined type that should be masked, If the second probability is below the first threshold and exceeds or equals the second threshold, (a) The step of analyzing the cropped image in order to determine the further probability that a detected object of a predetermined type is present in the region of the cropped image, (b) If the further probability is below the first threshold and above or equal to the second threshold, the step of cropping the region of the detected object in the cropped image in order to form a further cropped image, (c) The step of analyzing the further cropped image in order to determine another probability that the detected object is of the predetermined type, (d) If the further probability exceeds or is equal to the first threshold, the step of indicating that the detected object is of the predetermined type: Includes, Steps (a) to (d) described above are performed recursively until a predetermined condition is met. method.

2. The method according to claim 1, wherein the step of analyzing the second resolution image and the step of analyzing the cropped image are performed in a neural network.

3. The method according to claim 1, wherein the step of analyzing the second resolution image is performed in a first neural network, and the step of analyzing the cropped image is performed in a second neural network.

4. Steps (a) to (d) described above are subject to the following conditions, namely, The further probability in step (b) is below the second threshold, or above or equal to the first threshold, A predetermined number of repetitions, and If the resolution of the further cropped image falls below a predetermined resolution The method according to claim 1, which is performed recursively until at least one of the following conditions is met.

5. The method according to any one of claims 1 to 4, wherein the step of analyzing the first resolution image comprises determining the probability that the detected object of a predetermined type is present in two or more regions, and each of the two or more regions is cropped and analyzed in the first resolution image.

6. The steps include analyzing a second set of resolution images, The steps include detecting motion in the set of the second resolution images, If the detected motion exceeds or is equal to a motion threshold, the steps include cropping a further region in the subsequent image frame and analyzing the cropped image. If the second probability exceeds or is equal to the first threshold, the step of indicating that the detected object is of the predetermined type: The method according to any one of claims 1 to 4, including the method described in any one of claims 1 to 4.

7. The method according to claim 6, wherein if the detected motion falls below the motion threshold in the first image frame, the results of the analysis step are reused in the same cropped region in a subsequent image frame to determine the second probability in the first frame.

8. The method according to any one of claims 1 to 4, wherein if the first probability exceeds or is equal to the first threshold, an indication is given that the detected object is of the predetermined type.

9. The method according to any one of claims 1 to 4, wherein the resolution of the first resolution image is the resolution at the time it was captured by the image acquisition device.

10. The method according to any one of claims 1 to 4, wherein the resolution of the second resolution image depends on the size of the neural network used to analyze the second resolution image.

11. The method according to any one of claims 1 to 4, wherein the method is performed at a rate substantially corresponding to the frame rate of the captured video stream including the first resolution image.

12. A processing circuit comprising one or more processors and a memory storing instructions for object detection in an image collected by a camera, wherein the processing circuit To acquire a first resolution image collected by the aforementioned camera, Scaling the first resolution image to a second resolution image having a lower resolution than the first resolution image, Analyzing a second resolution image to determine a first probability that a detected object of a predetermined type is present in the region of the first resolution image, wherein the second resolution image is analyzed in such a way that the predetermined type is an object to be masked. If the first probability is below a first threshold and above or equal to a second threshold, crop the region in the first resolution image that includes the detected object, wherein the probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and the probability below the second threshold indicates that the detected object is not of a predetermined type. To determine a second probability that the detected object is of the predetermined type, the cropped image is analyzed, If the second probability exceeds or is equal to the first threshold, the detected object is indicated to be of the predetermined type that should be masked. If the second probability is below the first threshold and exceeds or equals the second threshold, (a) Analyzing the cropped image to determine the further probability that the detected object of the predetermined type is present in the region of the cropped image, (b) If the further probability is below the first threshold and above or equal to the second threshold, crop the region of the detected object in the cropped image in order to form a further cropped image. (c) Analyzing the further cropped image in order to determine another probability that the detected object is of the predetermined type, (d) If the further probability exceeds or is equal to the first threshold, the detected object is to be indicated as being of the predetermined type. It is configured to do the following: Steps (a) to (d) are executed recursively until a predetermined condition is met. Processing circuit.

13. A system comprising a camera for capturing an image of a scene including an object, and the processing circuit described in claim 12.

14. A non-temporary computer-readable storage medium storing a computer program for object detection in an image, wherein the computer program, when executed on a processing circuit comprising one or more processors, is configured to be used by the processing circuit. To acquire a first resolution image collected by the camera, Scaling the first resolution image to a second resolution image having a lower resolution than the first resolution image, Analyzing a second resolution image to determine a first probability that a detected object of a predetermined type is present in the region of the first resolution image, wherein the second resolution image is analyzed in such a way that the predetermined type is an object to be masked. If the first probability is below a first threshold and above or equal to a second threshold, crop the region in the first resolution image that includes the detected object, wherein the probability above or equal to the first threshold indicates that the detected object is of a predetermined type, and the probability below the second threshold indicates that the detected object is not of a predetermined type. To determine a second probability that the detected object is of the predetermined type, the cropped image is analyzed, If the second probability exceeds or is equal to the first threshold, the detected object is indicated to be of the predetermined type that should be masked. If the second probability is below the first threshold and exceeds or equals the second threshold, (a) Analyzing the cropped image to determine the further probability that the detected object of the predetermined type is present in the region of the cropped image, (b) If the further probability is below the first threshold and above or equal to the second threshold, crop the region of the detected object in the cropped image in order to form a further cropped image. (c) Analyzing the further cropped image in order to determine another probability that the detected object is of the predetermined type, (d) If the further probability exceeds or is equal to the first threshold, the detected object is to be indicated as being of the predetermined type. Includes computer code that causes the following: Steps (a) to (d) are executed recursively until a predetermined condition is met. Non-temporary computer-readable storage medium.