Detection device, detection system, detection method, and computer program

The detection device uses a pre-trained image feature extraction model to separate and classify small, distant flying objects in captured images, improving detection accuracy by training on multiple images to distinguish moving objects from background features.

WO2026133992A1PCT designated stage Publication Date: 2026-06-25NEC CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NEC CORP
Filing Date
2025-12-05
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing systems struggle to accurately detect small, distant flying objects such as drones in wide monitoring areas due to their small size in captured images, leading to difficulties in precise identification.

Method used

A detection device and method utilizing a pre-trained image feature extraction model that distinguishes between moving objects and background using image features, trained on images taken at different times to separate and classify moving objects based on their features in a feature space.

Benefits of technology

Enhances the accuracy of detecting small, distant flying objects by effectively distinguishing them from background, allowing for precise identification and tracking.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025042422_25062026_PF_FP_ABST
    Figure JP2025042422_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided is a technology that makes it possible to detect a detection-target movable body from a captured image without being negatively affected by detection of a movable body that does not need monitoring. This detection device receives captured images in which a monitoring area is captured, extracts image portions including a movable body from the captured images, classifies whether the movable body is any of predetermined detection targets or a non-detection target by using an image feature amount extracted from the image portions by using a pre-trained image feature extraction model, and outputs detection information indicating that a detection-target movable body is captured in the captured images. The image feature extraction model is a model which receives, as an input, the image portions including the movable body and outputs an image feature amount of the received image portions, and which is characterized by being trained such that, by using image portions including the movable body in the plurality of captured images captured at different time points and background images not including the movable body in the image portions as training image portions, background feature amounts extracted from the background images of the plurality of training image portions are close to each other in a feature amount space, movable body feature amounts extracted from images of the movable body in the training image portions are close to each other in the feature amount space, and the background feature amounts and the movable body feature amounts are far from each other in the feature amount space.
Need to check novelty before this filing date? Find Prior Art

Description

Detection device, detection system, detection method, and computer program

[0001] The present disclosure relates to a detection device, a detection system, a detection method, and a computer program.

[0002] The utilization of unmanned aircraft such as drones and UAVs (Unmanned Aerial Vehicles) has increased, and there are concerns about the intrusion of unmanned aircraft into important facilities such as airports and nuclear power plants and the restricted airspaces around them. Therefore, there is an increasing demand for technologies to monitor such restricted airspaces and detect intruding unmanned aircraft.

[0003] The flying object monitoring system described in Patent Document 1 monitors flying objects that intrude into a predetermined monitoring area. The flying object monitoring system is configured to detect a flying object flying in the monitoring area by a monitoring radar and determine whether the flying object is a detection target using the detection result of the monitoring radar.

[0004] Japanese Patent Application Laid-Open No. 2017-169076

[0005] As described above, the flying object monitoring system described in Patent Document 1 determines whether a flying object is a detection target using the detection result of the radar. When comparing the detection of a detection target by radar and the detection of a detection target by a captured image, the detection of a detection target by a captured image can improve the detection accuracy by increasing the resolution of the image. For this reason, there is a need to find a detection target using an image rather than a radar. In response to this need, when trying to detect a flying object in a wide monitoring area (for example, an area including an airport and its surrounding restricted airspace) using an imaging device, a distant flying object is imaged small, and there is a problem that it is difficult to detect the distant flying object with high accuracy.

[0006] An object of the present disclosure is to provide a detection device, a detection system, a detection method, and a computer program that can appropriately determine whether a moving object that is unclear in a captured image due to its small size is a detection target.

[0007] In achieving the above objective, a detection device according to one embodiment comprises: a receiving means for receiving a captured image of a monitoring area; an extraction means for extracting an image portion containing a moving object from the captured image; a classification means for classifying whether a moving object is one of a predetermined detection target or not, using image features extracted from the image portion using a pre-trained image feature extraction model; and an output means for outputting detection information indicating that a moving object of the detection target is visible in the captured image. The image feature extraction model is a model that takes an image portion containing a moving object as input and outputs image features of the input image portion. The image feature extraction model is characterized by being trained using an image portion containing a moving object and a background image that does not contain a moving object in each of a plurality of captured images taken at different times as training image portions, such that the background features extracted from each background image of the plurality of training image portions approach each other in the feature space, the moving object features extracted from each image of a moving object in the training image portion approach each other in the feature space, and the background features and moving object features move apart in the feature space.

[0008] In other findings that achieve the above objectives, a detection system according to one embodiment comprises a detection device, an imaging device, and a display device according to the above embodiment.

[0009] Furthermore, in yet another finding to achieve the above objective, a detection method according to one embodiment involves a computer receiving a captured image of a monitoring area, extracting an image portion containing a moving object from the captured image, classifying whether the moving object is one of the predetermined detection targets or not using the image features extracted from the image portion with a pre-trained image feature extraction model, outputting detection information indicating that a detection target moving object is visible in the captured image, the image feature extraction model is a model that takes an image portion containing a moving object as input and outputs image features of the input image portion, and the image feature extraction model is characterized in that it is trained using an image portion containing a moving object and a background image that does not contain a moving object in each of multiple captured images taken at different times as training image portions, so that the background features extracted from each background image of the multiple training image portions approach each other in the feature space, the moving object features extracted from each moving object image of the training image portions approach each other in the feature space, and the background features and moving object features move apart in the feature space.

[0010] Furthermore, in yet another finding to achieve the above objective, a computer program according to one embodiment causes the computer to perform a receiving process to receive an image in which a monitoring area has been captured, an extraction process to extract an image portion containing a moving object from the captured image, a classification process to classify whether the moving object is one of the predetermined detection targets or not, using the image features extracted from the image portion using a pre-trained image feature extraction model, and an output process to output detection information indicating that a detection target moving object is visible in the captured image, and the image feature extraction model includes images containing a moving object The model takes a portion of an image as input and outputs image features of the input image portion. The image feature extraction model is characterized by being trained to use an image portion containing a moving object and a background image that does not contain the moving object in each of multiple captured images taken at different times as training image portions, so that the background features extracted from each background image of the multiple training image portions are close to each other in the feature space, the moving object features extracted from each moving object image of the training image portions are close to each other in the feature space, and the background features and moving object features are far apart in the feature space.

[0011] This disclosure provides a detection device, detection system, detection method, and computer program that can appropriately determine whether a moving object that is obscured in a captured image due to its minute size is a target for detection.

[0012] This is a block diagram illustrating an example of the configuration of the detection system according to this disclosure. This is a diagram illustrating an example of the detection system according to this disclosure. This is a diagram illustrating an example of a captured image. This is a diagram illustrating an example of a foreground region. This is a diagram illustrating an example of an extracted image. This is a diagram illustrating an example of the predicted position of the tracking unit 130 on the captured image according to this disclosure. This is a diagram illustrating an example of the processing of the tracking unit 130. This is a diagram illustrating an example of the process of training an image feature extraction model. This is a diagram illustrating an example of the process of training an image feature extraction model. This is a diagram illustrating an example of the display by the output unit 150. This is a diagram illustrating an example of the display by the output unit 150. This is a flowchart illustrating an example of the processing operation of the detection device 10 according to this disclosure. This is a block diagram illustrating an example of the configuration of the detection device 200 according to this disclosure. This is a flowchart illustrating an example of the processing operation of the detection device 200 according to this disclosure. This is a diagram illustrating an example of the hardware configuration of the detection device.

[0013] Embodiments of this disclosure will be described with reference to the drawings. The reference numerals in the drawings are provided for convenience as examples to aid understanding and are not intended to limit this disclosure to the illustrated embodiments.

[0014] <First Embodiment> <Configuration in the First Embodiment> As shown in Figure 1, the detection system in the first embodiment comprises a detection device 10, a radio wave detection device 20, a shooting device 30, a drive device 40, and a display device 50. The detection system detects the aircraft to be detected. The aircraft to be detected is not predetermined or limited by the system designer, but examples include unmanned aerial vehicles such as drones, airplanes, helicopters, flying cars, balloons, paragliders, and living creatures such as birds. However, unless otherwise specified, the description will assume that a drone 100 as shown in Figure 2 is detected as the aircraft to be detected.

[0015] The radio wave detection device 20 is connected to the detection device 10 either via a communication network or directly by cable. The radio wave detection device 20 is a device that has the function of detecting objects using radio waves and the function of calculating the position information of the detected object, and its configuration is not limited, but for example it is a passive radar. The radio wave detection device 20 is installed in a position where it can monitor the monitoring area and detects flying objects in the monitoring area. The monitoring area is a predetermined area, for example, an area set within or around the site of an airport or nuclear power plant facility, or an airspace where drones are prohibited from flying. The radio wave detection device 20 also transmits object detection information, which is information about the detected object, to the detection device 10. In the first embodiment, the object detection information includes information indicating that an object has been detected and the position information of the object. The radio wave detection device 20 transmits the object detection information to the detection device 10 in real time, for example.

[0016] The imaging device 30 and the drive unit 40 are connected to the detection device 10 either via a communication network or directly by cable. The imaging device 30 is, for example, a visible light camera with a magnification adjustment function. The imaging device 30 is connected to the drive unit 40. The drive unit 40 is a device that changes the imaging direction (the direction the optical axis is pointing) of the imaging device 30 and controls the pan angle and tilt angle of the imaging device 30. The operation of the drive unit 40 and the imaging device 30 is controlled by the detection device 10. The imaging device 30 is installed in a position where it can capture the monitoring area. The imaging device 30 captures the monitoring area as a video and transmits the captured video to the detection device 10 in real time. A video is a series of still images captured in sequence. In other words, a video consists of multiple captured images, which are multiple still images taken at different times.

[0017] The display device 50 is connected to the detection device 10 via a communication network or cable. For example, the display device 50 is a device used by users of the detection system. The display device 50 displays information such as characters and images based on the display control operation of the detection device 10.

[0018] As shown in Figure 1, the detection device 10 includes a receiving unit 110, a foreground extraction unit 120, a tracking unit 130, a classification unit 140, an output unit 150, an image capture control unit 160, and a storage unit 170.

[0019] The receiving unit 110 receives aircraft detection information from the radio wave detection device 20 indicating the detection of an aircraft flying in the monitoring area. The receiving unit 110 also receives video from the recording device 30.

[0020] The foreground extraction unit 120 processes the video received by the receiving unit 110. Specifically, the foreground extraction unit 120 extracts the foreground region from the images designated as the processing target among the captured images that make up the video. The images designated as the processing target are, for example, images designated by the designer of the detection system, taking into consideration the video shooting rate. The images to be processed may be all the images in the video, or they may be images extracted from a series of images in chronological order at predetermined intervals (for example, every five images). The foreground region is the region in the captured image in which a moving object is visible. In this embodiment, a moving object refers to an object or phenomenon whose position changes over time, and specific examples include airplanes, helicopters, drones, and birds, which are the objects to be detected, and moving objects that are not to be detected, such as moving waves and leaves swaying in the wind.

[0021] A moving object is extracted from a captured image, for example, by the difference between the captured image and the background model. The background model is information about the background image. One example of a method for extracting the foreground region using a background model is a method that uses an omnidirectional background model and a partial background model. The omnidirectional background model is a background model of the entire monitoring area that is generated in advance using an image of the monitoring area and stored in the memory unit 170. The partial background model is a background model of a portion of the captured image that corresponds to the captured image, extracted from the omnidirectional background model using the pan angle and tilt angle of the shooting device 30 at the time of shooting.

[0022] The foreground extraction unit 120 reads a partial background model from the omnidirectional background model in the storage unit 170, for example, using the pan angle and tilt angle at the time of shooting. The foreground extraction unit 120 extracts the foreground region from the captured image using the partial background model. For example, the foreground extraction unit 120 extracts foreground region (A), foreground region (B), and foreground region (C) as shown in Figure 4 from a captured image to be processed, as shown in Figure 3, using the partial background model. The foreground extraction unit 120 also associates the extracted foreground region with an image identification number indicating the shooting order of the captured image from which the foreground region was extracted, and the position information of the foreground region in the captured image, and stores them in the storage unit 170.

[0023] The tracking unit 130 tracks the foreground region in a series of images of multiple processing targets. In the following description, the image of a processing target processed by the tracking unit 130 will be referred to as the "focus image," and images of processing targets processed before the focus image will be referred to as "past images." The tracking unit 130 uses foreground region information extracted from past images, but there are cases where the foreground region information of past images is not stored in the storage unit 170, such as when receiving images from the imaging device 30. In such cases, the tracking unit 130 performs processing using a processing method that does not use foreground region information from past images. The processing method is not limited here, so a description of the processing method will be omitted. In the following description, it will be assumed that the foreground region information of past images is stored in the storage unit 170.

[0024] The tracking unit 130 has several functions as described below. Specifically, one of its functions is to delete from the storage unit 170 the tracking results of the foreground region indicating a moving object classified as a non-detectable object by the classification unit 140. In other words, the tracking unit 130 deletes from the storage unit 170 the tracking results of the foreground region indicating a moving object classified as a non-detectable object, thereby ceasing the tracking of the foreground region determined to be a non-detectable object. By ceasing the tracking of the foreground region determined to be a non-detectable object, it becomes possible to concentrate on tracking the detectable object. That is, the adverse effects caused by non-detectable moving objects (in other words, noise) are reduced, and it becomes possible to detect the target moving object from the captured image without being affected by the adverse effects of detecting moving objects that do not need to be monitored.

[0025] As another function, the tracking unit 130 predicts the position of the image portion in the image of interest where a moving object in the foreground region extracted from past images is visible. The prediction process of the tracking unit 130 will be explained using a specific example. In this example, it is assumed that information of the tracking results of the foreground region in consecutive past images F1 to F5, as shown in Figure 6, is stored in the storage unit 170. In Figure 6, the tracking results corresponding to past images F1 to F5 are indicated by the symbols F1 to F5. As mentioned above, the tracking results of the foreground region are the result of tracking the foreground region in which the same moving object is visible (hereinafter also referred to as the foreground region of the same moving object) in correspondence with the captured images of multiple consecutive objects to be processed.

[0026] For example, the tracking unit 130 reads the tracking results of the foreground region in past images F1 to F5, as shown in Figure 6, from the storage unit 170. Using the read tracking results, the tracking unit 130 predicts the predicted position of the foreground region of the same moving object in the target image F6, which follows the past images F1 to F5. The predicted position is predicted using a method selected from various known methods.

[0027] As another function, the tracking unit 130 associates the tracking results in past images with the foreground region in the image of interest. For example, the tracking unit 130 associates the foreground region between captured images using the foreground region in past images, the predicted predicted position, a motion model that models the movement of the foreground region, and the classification result of the foreground region (moving object) by the classification unit 140. The motion model is, for example, a probabilistic model that shows the nonlinearity of time-series transitions using the positional information of the foreground region on the captured image. This probabilistic model is, for example, a Laplace distribution. The motion model that models the movement of each foreground region may be generated in advance and stored in the memory unit 170, or the motion model may be generated when the foreground region is extracted. There are also cases in which the tracking unit 130 does not associate the tracking results with the foreground region in the image of interest. For example, if there are two tracking results but only one foreground region extracted in the image of interest, the tracking unit 130 will associate only one of the tracking results. Furthermore, as shown in Figure 6, when the tracking unit 130 extracts a foreground region (B) and a foreground region (C) from the image of interest F6, the tracking results from past images F1 to F5 are not necessarily correlated.

[0028] The following describes a specific example of the mapping process of the tracking unit 130. First, assuming that the tracking results of past images F1 to F5 shown in Figure 6 are for tracking the same drone, and that the foreground region of the tracking results has been classified as the drone to be detected by the classification unit 140, the processing of the tracking unit 130 will be described. Furthermore, in the image of interest F6, foreground region (A) is the foreground region of the drone, foreground region (B) is the foreground region of the bird, and foreground region (C) is the foreground region of the wave.

[0029] The tracking unit 130 uses the predicted position shown in Figure 6 to tentatively associate the tracking results of the foreground regions of past images F1 to F5 with the foreground region (A) closest to the predicted position. The tracking unit 130 also updates the motion model using the tracking results of the foreground regions of past images F1 to F5. For example, the tracking unit 130 uses the updated motion model for each foreground region to predict the position on the image of interest F6 that corresponds to the tracking results of the foreground regions of past images F1 to F5, and calculates the position (D) according to the motion model as shown in Figure 7. The tracking unit 130 uses the calculated position (D) to finalize the association between the foreground regions of past images F1 to F5 and the foreground region (A) of the image of interest F6. The tracking unit 130 stores the result of this association as the tracking result in the storage unit 170. Note that it takes time for a motion model of a moving object that has just begun to appear in the captured image to exist, so there are tracking results for past images for which the position according to the motion model has not been calculated. In this case, the foreground regions of past images F1 to F5 are associated with the foreground region (A) of the image of interest F6 using predicted positions without using a motion model. The tracking unit 130 may update the motion model corresponding to the tracking results of the foreground regions of past images F1 to F5 using the classification results of the foreground regions by the classification unit 140. For example, if the classification by the classification unit 140 determines that the foreground regions of F1 to F5 are drones, the tracking unit 130 may use a motion model corresponding to drones to predict the position on the image of interest F6 that corresponds to the tracking results of the foreground regions of past images F1 to F5, and calculate the position (D) using the motion model as shown in Figure 7.

[0030] The classification unit 140 uses the extracted image and trajectory information to classify moving objects that appear in the foreground region extracted from the captured image. Specifically, as shown in Figure 1, the classification unit 140 comprises an image extraction unit 141, a trajectory extraction unit 142, and a determination unit 143.

[0031] The image extraction unit 141 extracts the image portion containing the moving object from the captured image. That is, the image extraction unit 141 extracts an extracted image, which is the image portion corresponding to the moving object in the foreground region, from the captured image to be processed. The image extraction unit 141 reads the position information of the foreground region in the captured image to be processed from the storage unit 170 using the image identification number of the captured image to be processed, and uses the read foreground region position information to extract an extracted image from the captured image to be processed, for example, as shown in Figure 5.

[0032] The trajectory extraction unit 142 extracts (generates) trajectory information of the foreground region. The trajectory extraction unit 142 uses the tracking results of the foreground region by the tracking unit 130. The foreground region tracking results are the result of tracking the foreground region in which the same moving object is captured in a sequence of images of multiple processing targets in chronological order, and represent the movement of the moving object over time. Trajectory information using such foreground region tracking results is, for example, information whose constituent elements are the coordinates (X, Y), width, and height (W, H) of the center of the rectangle surrounding the foreground region, arranged in chronological order.

[0033] The determination unit 143 uses image features extracted from the image using a pre-trained image feature extraction model to classify whether a moving object is one of the predetermined detection targets or not. Image features are quantitative numerical representations of image features, and in the first embodiment, image features are extracted using an image feature extraction model. The detection targets may be one type or multiple types. Examples of detection targets include flying objects such as airplanes, helicopters, drones, and birds. Non-detection targets refer to moving objects other than detection targets, specifically such as moving waves or leaves swaying in the wind.

[0034] An image feature extraction model is a model that takes an extracted image containing a moving object as input and outputs the image features of the input extracted image. The image feature extraction model is trained using the image portion containing the moving object and the background image that does not contain the moving object from each of multiple captured images taken at different times as training image portions, as follows: The image feature extraction model is trained so that the background features extracted from each background image of the multiple training image portions are close to each other in the feature space, the moving object features extracted from each image of the moving object in the training image portions are close to each other in the feature space, and the background features and moving object features are far apart in the feature space. The training image portion is an image used to train the image feature extraction model, and refers to an image of the moving object and a background image generated at the same position as the moving object. The background image is an image generated at the same position as the foreground region extracted from the captured image using a background model or AI (Artificial Intelligence) technology, and does not contain the moving object. Known methods can be used to generate background images. For example, if the background model is a Gaussian mixture distribution, the background image color may be obtained from the sum of the product values ​​of the color modeled by the Gaussian mixture distribution and its probability for each pixel of the captured image. The background image may also be one generated using prompts about various environmental conditions input to a large-scale language model. Environmental conditions refer to the weather conditions of the surrounding area including the monitoring area, such as weather information, wind speed, tides, humidity, atmospheric pressure, sunshine duration, visibility, ultraviolet index, air pollution index, and climate classification. By training an image feature extraction model using background images generated by the large-scale language model, it becomes possible to extract image features with higher detection accuracy for the target object. For the classification process of the judgment unit 143, AI (Artificial Intelligence) technology such as a neural network is used. Moving objects in captured images have specific movements depending on their type. That is, the trajectory (trajectory pattern) of a moving object differs depending on its type. Using this, the judgment unit 143 classifies the moving objects in the captured image.Since the classification of moving objects is performed using trajectory patterns specific to each type of moving object, a trajectory of sufficient length is required for classification, and it may take, for example, several seconds from the time a moving object appears in the captured image until it is classified.

[0035] This section describes the specific training process for the image feature extraction model. In this example, it is assumed that images of the moving object are taken from the same drone but at different times. The image feature extraction model, for example as shown in Figure 8, includes, at time t1, image D1 (the portion containing the moving object (drone)) and background image Bd1 as the training image portion. At time t2, image D2 (the portion containing the moving object (drone)) and background image Bd2 are included as the training image portion. Furthermore, at time t3, image D3 (the portion containing the moving object (drone)) and background image Bd3 are included as the training image portion. Times t1, t2, and t3 represent, for example, the times when images D1, D2, and D3 were taken, respectively. These times are arranged in chronological order. First, the training process using the training image portions at times t1 and t2 will be described. In other words, the image feature extraction model is generated by learning to move the moving object features d1, d2, and d3 extracted from the input moving object image D1 (shown in Figure 8) and the background features b1, b2, and b3 extracted from the background image Bd1 far apart in the feature space, and to bring the input moving object feature d1 and moving object feature d2 closer together in the feature space. The learning process using the training image portions at times t2 and t3 is the same as the learning process using the training image portions at times t1 and t2. By repeating this learning process, the image feature extraction model is trained.

[0036] Next, we will explain the specific process for training an image feature extraction model using training image portions that contain different types of moving objects. In this example, we assume that the images of moving objects are images of the same drone taken at different times, and images of the same bird taken at different times. Note that the time when the drone was photographed and the time when the bird was photographed may or may not be the same.

[0037] The image feature extraction model, for example as shown in Figure 9, includes, at time t1, image T1, which contains the moving object (bird), and background image Bt1 as the training image portion. At time t2, the training image portion includes image T2, which contains the moving object (bird), and background image Bt2. At times t1 and t2, the relationship between the image of the same moving object (bird) and the background image is learned using the same process as when the moving object is a drone, as described above. Furthermore, the image feature extraction model is generated by learning to separate the moving object features d1 and d2 extracted from images D1 and D2 showing the drone and the moving object features a1 and a2 shown in Figure 9, extracted from images T1 and T2 showing the bird, in the feature space. By repeating this learning process, the features of the image showing the drone and the features of the image showing the bird are learned in the feature space, the separation boundary between different types of moving objects becomes clearer, and the accuracy of classification improves.

[0038] The output unit 150 outputs detection information indicating that a moving object to be detected is visible in the captured image. This detection information includes, for example, the tracking results of the moving object stored in the memory unit 170 or the classification results of the moving object by the classification unit 140. The output unit 150 also controls the display operation of the display device 50. The output unit 150 may output (display) the tracking results to the display device 50 each time the tracking results of the foreground area or the classification results of the moving object are stored, or it may output the tracking results of the foreground area to the display device 50 after a predetermined number of times have been collected. The predetermined number of times is, for example, five times, and can be appropriately determined by the user using the display device 50. For example, when displaying five sets of tracking results together, the output unit 150 may output a trajectory to the display device 50 that connects the tracking results with straight lines, such as the thick line shown in Figure 10. In the example in Figure 10, the tracking results of the foreground area are superimposed on the captured image. This allows the user to confirm which moving object the outputted tracking results belong to. In the example shown in Figure 10, the tracking results of the foreground region are superimposed on the captured image, and the classification results for each foreground region are displayed. This allows the user to confirm the classification results of the outputted moving objects. In such cases, the output unit 150 displays the classification results in text, for example, using the classification results and the classification information set by the user. Alternatively, it may be displayed using a display method set in advance by the user; for example, the color of the trajectory or the display of solid or dashed lines may differ for each type of classification. This makes it easier for the user to understand that multiple moving objects have been detected. The detection device 10 may further include an acquisition unit 180 that acquires textual information explaining the classification results from a large-scale language model, and the output unit 150 may output the acquired textual information to a display device. The large-scale language model may be included in the detection device 10, and the tracking results from the tracking unit 130 and the classification results of the moving objects from the classification unit 140 may be input to the large-scale language model, and the tracking results and classification results may be converted into textual information and stored in the storage unit 170. If the detection device 10 includes an acquisition unit 180, the acquisition unit 180 may acquire character information, and the output unit 150 may display the character information output from the large-scale language model on the display device 50 as shown in Figure 11.By acquiring such textual information, the detection device 10 can be equipped with a report generation function. The report generation function generates an aircraft detection report using the textual information and the aircraft detection report format. By acquiring textual information from a large-scale language model and equipping the detection device 10 with a report generation function, the detection device 10 can create reports, reducing the burden of report creation for the user. The output unit 150 monitors the change in the size of the moving object in the captured images over time using the tracking results, and outputs warning information if it determines that the change in the size of the moving object in the captured images over time is in a situation where warning information should be output according to predetermined criteria. The predetermined criteria include, for example, whether the size of the latest extracted image is greater than 150% of the first extracted image when comparing the first extracted image and the latest extracted image. If the size of the latest extracted image is greater than 150% of the first extracted image according to this criterion, the output unit 150 outputs a warning. For example, if the trajectory information on the captured image remains the same, but the extracted image of the moving object becomes larger than 150% over time, the output unit 150 outputs a warning to the display device 50 indicating that the distance between the camera 30 and the moving object is decreasing.

[0039] The shooting control unit 160 transmits control signals to the shooting device 30 and the drive device 40. For example, the shooting control unit 160 generates control signals for the drive device 40 using the position information of the drone 100 included in the aircraft detection information received by the receiving unit 110. The control signals for the drive device 40 are signals that control the pan angle and tilt angle of the shooting device 30. For example, if the shooting control unit 160 receives an instruction from the user of the detection device 10 to take a picture so that the drone 100 is in the center of the image, it generates control signals that control the pan angle and tilt angle of the drive device 40 so that the drone 100 is in the center of the image and can be tracked. The shooting control unit 160 transmits the generated control signals to the drive device 40. The control signals for the shooting device 30 are signals that control the magnification of the shooting device 30. For example, if the shooting control unit 160 receives an instruction from the user of the detection device 10 to take a picture so that the drone 100 is magnified, it generates a control signal for the shooting device 30 to control the magnification so that the drone 100 is zoomed in and transmits it to the shooting device 30. In the normal state (normal operation) when the entire monitoring area is detected, the shooting control unit 160 generates and transmits control signals to the shooting device 30 and the drive device 40 respectively to capture the entire monitoring area in a predetermined direction and at a predetermined magnification.

[0040] The memory unit 170 is a storage medium that stores information about the foreground region. This information about the foreground region includes, for example, the foreground region extracted by the foreground extraction unit 120, the background model used by the foreground extraction unit 120, the tracking results associated with the tracking unit 130, the motion model used by the tracking unit 130, the classification results of the foreground region classified by the classification unit 140, and the image feature extraction model used by the classification unit 140. The memory unit 170 may be included in the detection device 10 or may be provided outside the detection device 10.

[0041] <Example of Operation in the First Embodiment> Below, an example of the processing operation of the detection device 10 in the first embodiment will be described using Figure 12. Figure 12 is a flowchart illustrating an example of the processing operation of the detection device 10. It can also be said that Figure 12 illustrates the detection method by the detection device 10. The explanation assumes that the tracking results of the foreground area are stored in the storage unit 170.

[0042] First, the receiving unit 110 receives aircraft detection information from the radio wave detection device 20 and the captured image from the shooting device 30 (step S101). Then, the classification unit 140 extracts an extracted image including the foreground region from the captured image and also extracts trajectory information using the tracking results from the storage unit 170 (step S102). Then, the classification unit 140 uses the image features extracted from the image portion using a pre-trained image feature extraction model to classify whether the moving object is one of the predetermined detection targets or not (step S103). Then, the tracking unit 130 tracks the moving object using the tracking results of past images, the foreground region including the moving object extracted from the captured image, and the classification results from the classification unit 140 (step S104). Then, the output unit 150 moves on to the process of determining whether or not it is the output timing (step S105). If it is the output timing (Yes in step S105), the output unit 150 outputs detection information indicating that the moving object to be detected is visible in the captured image (step S106). The detection device 10 then determines whether or not it has received a detection termination instruction (step S107). If it has not received a termination instruction (No in step S107), the detection device 10 repeats the processing from step S101 onwards. Also, if it is not the output timing in step S105 (No in step S105), the detection device 10 repeats the processing from step S101 onwards. If it has received a termination instruction (Yes in step S107), the detection device 10 terminates processing.

[0043] <Effect of the First Embodiment>The detection device 10 of the present embodiment includes a reception unit 110, a foreground extraction unit 120, a tracking unit 130, a classification unit 140, an output unit 150, and a shooting control unit 160, and the classification unit 140 includes a determination unit 143. According to the detection device 10 of the present embodiment by the determination unit 143, the classification of the foreground region is configured to be performed using the image feature amount extracted from the image portion by the learned image feature extraction model and the trajectory information. With such a configuration, the detection device 10 uses a model learned so that the moving body feature amounts are separated in the feature amount space according to the type of the moving body, so that the separation boundary of the feature amounts according to the type of the moving body becomes clear. As a result, it becomes possible to extract feature amounts suitable for classification, and it is possible to provide a detection device, a detection system, a detection method, and a computer program that can appropriately determine whether or not a moving body that is unclear in the captured image due to its small size is a detection target.

[0044] <Second Embodiment>The second embodiment in the present disclosure will be described with reference to the drawings.

[0045] <Configuration in the Second Embodiment>As shown in FIG. 13, the detection device 200 includes a reception means 210, an extraction means 220, a classification means 230, and an output means 240. The reception unit 110, the image extraction unit 141, the classification unit 140, and the output unit 150 of the first embodiment are examples of the reception means 210, the extraction means 220, the classification means 230, and the output means 240.

[0046] The reception means 210 receives a captured image of the monitoring area.

[0047] The extraction means 220 extracts an image portion including a moving body from the captured image.

[0048] The classification means 230 classifies whether the moving object is any of the predetermined detection targets or a non-detection target, using the image feature amounts extracted from the image portions by using a pre-learned image feature extraction model. The image feature extraction model is a model that inputs an image portion including the moving object and outputs the image feature amounts of the input image portion. Further, the image feature extraction model uses, as learning image portions, an image portion including the moving object and a background image not including the moving object in each of a plurality of captured images at different shooting times, and the background feature amounts extracted from the respective background images of the learning image portions approach each other in the feature amount space, and the moving object feature amounts extracted from the respective moving object images of the learning image portions approach each other in the feature amount space, and is characterized by being learned such that the background feature amounts and the moving object feature amounts are separated from each other in the feature amount space.

[0049] The output means 240 outputs detection information indicating that a detection target moving object is shown in the captured image.

[0050] <Example of operation in the second embodiment> An example of the processing operation of the detection device 200 in the second embodiment will be described using FIG. 14. FIG. 14 is a flowchart for explaining an example of the processing operation of the detection device 200. It can also be said that FIG. 14 explains the detection method by the detection device 200.

[0051] First, the reception means 210 receives a captured image of the monitoring area (step S401). Then, the extraction means 220 extracts an image portion including the moving object from the captured image (step S402). Then, the classification means 230 classifies whether the moving object is any of the predetermined detection targets or a non-detection target, using the image feature amounts extracted from the image portion by using a pre-learned image feature extraction model (step S403). Then, the output means 240 outputs detection information indicating that a detection target moving object is shown in the captured image (step S404).

[0052] <Effects of the Second Embodiment> The detection device 200 of this embodiment includes a receiving means 210, an extraction means 220, a classification means 230, and an output means 240. By using image features extracted from an image portion using a pre-trained image feature extraction model, the device classifies whether a moving object is one of the predetermined detection targets or not, thereby providing a detection device, detection system, detection method, and computer program that can appropriately determine whether a moving object that is too small to be clearly visible in a captured image is a detection target.

[0053] The above embodiments may be implemented in combination as appropriate.

[0054] <Modification> In the first embodiment, the foreground extraction unit 120 uses an omnidirectional background model, but the foreground extraction unit 120 may extract the foreground region using a pre-generated partial background model instead of an omnidirectional background model.

[0055] In the first embodiment, the detection device 10 is shown to process the captured image (video) captured by the shooting device 30 in real time. However, even when the user checks the trajectory of the foreground area using the captured image stored in the storage unit 170, for example, the detection device 10 may perform the same processing on the captured image to be checked as described in the first embodiment.

[0056] The receiving unit 110 may also receive images from a separate imaging device 31, which is different from the imaging device 30 that transmits the images processed by the foreground extraction unit 120 and the tracking unit 130, and which photographs moving objects classified as detection targets by the classification unit 140. The output unit 150 also outputs the images received from the imaging device 31. The imaging device 31 is connected to the drive unit 41. The drive unit 41 is a device that changes the shooting direction (the direction the optical axis is pointing) of the imaging device 31 and controls the pan angle and tilt angle of the imaging device 31. For example, when the detection device 10 is classified as a detection target by the classification unit 140, the shooting control unit 160, upon receiving an instruction from the user of the detection device 10 to photograph the moving object in a magnified image, generates a control signal to the imaging device 31 to control the magnification so that it zooms in on the moving object, and transmits it to the imaging device 31. The output unit 150 outputs an enlarged image, which is a captured image of a moving object taken by the imaging device 31 at a predetermined resolution, and the tracking results to the display device 50. The predetermined resolution may be adjustable by the user. The output unit 150 may display the enlarged image superimposed on the captured image taken by the imaging device 30, or it may display the captured image and the enlarged image side by side.

[0057] Furthermore, the detection system is also used to monitor drone logistics hubs and is sometimes incorporated into drone operation management systems.

[0058] <Example Hardware Configuration> Next, the hardware of each device constituting the detection device of this disclosure will be described. Figure 15 is a diagram showing an example of the hardware configuration of the detection device. The detection device includes detection device 10 and detection device 200.

[0059] The detection device can be configured using an information processing device (a so-called computer), and has the configuration illustrated in Figure 15. For example, the detection device 10 includes a processor 911, memory 912, input / output interface 913, and communication interface 914, etc. The components of the processor 911, etc., are connected by an internal bus or the like and are configured to communicate with each other.

[0060] However, the configuration shown in Figure 15 is not intended to limit the hardware configuration of the detection device. The detection device may include hardware not shown. Also, the number of processors 911 etc. included in the detection device is not intended to be limited to the example in Figure 15; for example, multiple processors 911 may be included in the detection device.

[0061] The processor 911 is a programmable device such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), or DSP (Digital Signal Processor). Alternatively, the processor 911 may be a device such as an FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit). The processor 911 executes various programs, including an operating system (OS).

[0062] Memory 912 includes RAM (Random Access Memory), ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc. Memory 912 stores OS programs, application programs, and various data.

[0063] The input / output interface 913 is an interface to the display device 50 and an input device (not shown). The display device 50 is, for example, a liquid crystal display. The input device is, for example, a device that accepts user input such as a keyboard or mouse.

[0064] The communication interface 914 is a circuit, module, etc., that communicates with other devices. For example, the communication interface 914 includes a wireless communication circuit or a NIC (Network Interface Card), etc.

[0065] The functions of the detection device are realized by various processing modules. These processing modules are realized, for example, by the processor 911 executing a program stored in memory 912. The program can be recorded on a computer-readable storage medium. The storage medium can be a non-transitory medium such as semiconductor memory, hard disk, magnetic recording medium, or optical recording medium. In other words, this disclosure can also be embodied as a computer program product. Furthermore, the program can be downloaded via a network or updated using the storage medium on which the program is stored. Moreover, the processing module may be realized by a semiconductor chip.

[0066] The detection device is equipped with a computer, and its functions are realized by having the computer execute a program. Furthermore, the detection device executes its control method through this program.

[0067] Some or all of the above embodiments may also be described as follows, but are not limited to the following:

[0068] [Note 1] The system comprises: receiving means for receiving a captured image of a monitoring area; extraction means for extracting an image portion including a moving object from the captured image; classification means for classifying whether the moving object is one of the predetermined detection targets or not, using the image features extracted from the image portion using a pre-trained image feature extraction model; and output means for outputting detection information indicating that a detection target moving object is visible in the captured image, wherein the image feature extraction model is a model that takes the image portion including the moving object as input and outputs the image features of the input image portion. The image feature extraction model is characterized in that it is trained to use an image portion containing a moving object and a background image that does not contain the moving object in each of a plurality of captured images taken at different times as training image portions, such that the background feature quantities extracted from each of the background images of the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images of the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

[0069] [Note 2] The detection device according to Note 1, characterized in that the output means outputs the result of classifying the moving body as detection information to a display device.

[0070] [Appendix 3] The detection device according to Appendix 1, further comprising tracking means for tracking the moving object in a plurality of captured images, wherein the output means outputs information indicating the trajectory of the moving object as detection information by the tracking means.

[0071] [Note 4] The detection device according to Note 1, wherein the output means further includes a function to control the display operation of a display device that displays the detection information, and the display device displays the captured image and displays the detection information superimposed on the captured image.

[0072] [Appendix 5] The detection device according to Appendix 1, further comprising acquisition means for acquiring character information describing the detection information from a large-scale language model, wherein the output means further outputs the character information.

[0073] [Note 6] The detection device according to Note 1, characterized in that the image feature extraction model is further trained using separate training image portions each containing different types of moving objects, such that the moving object features extracted from each moving object image in the multiple training image portions approach each other in the feature space for each type of moving object, and the moving object features extracted from images of different types of moving objects move away from each other in the feature space.

[0074] [Note 7] The detection device according to Note 1, characterized in that the image feature extraction model is trained using background images of different environmental conditions generated by AI (Artificial Intelligence) technology as the background image.

[0075] [Appendix 8] A detection system comprising: a detection device described in Appendix 1; a shooting device that transmits captured images to the detection device; and a display device that displays the detection information of the detection device.

[0076] [Note 9] A detection method characterized in that a computer receives an image of a monitored area, extracts an image portion containing a moving object from the image, uses a pre-trained image feature extraction model to classify whether the moving object is one of the predetermined detection targets or not, outputs detection information indicating that a detection target moving object is visible in the image, the image feature extraction model is a model that takes the image portion containing the moving object as input and outputs the image feature of the input image portion, and the image feature extraction model is trained to use an image portion containing a moving object and a background image that does not contain a moving object in each of a plurality of images taken at different times as training image portions, such that the background feature quantities extracted from each of the background images of the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images of the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

[0077] [Note 10] The computer is made to perform a receiving process to receive an image of the monitored area, an extraction process to extract an image portion containing a moving object from the image, a classification process to classify whether the moving object is one of the predetermined detection targets or not, using the image features extracted from the image portion using a pre-trained image feature extraction model, and an output process to output detection information indicating that a detection target moving object is visible in the image, wherein the image feature extraction model is a model that takes the image portion containing the moving object as input and outputs the image features of the input image portion, The image feature extraction model is a computer program characterized by being trained such that, using an image portion containing a moving object and a background image not containing the moving object in each of a plurality of captured images taken at different times as training image portions, the background feature quantities extracted from each of the background images in the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images in the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

[0078] This application claims priority based on Japanese Patent Application No. 2024-224814, filed on 20 December 2024, and incorporates all of its disclosures herein.

[0079] 10 Detection device 20 Radio wave detection device 30 Shooting device 31 Shooting device 40 Drive device 41 Drive device 50 Display device 100 Drone 110 Receiving unit 120 Foreground extraction unit 130 Tracking unit 140 Classification unit 141 Image extraction unit 142 Trajectory extraction unit 143 Judgment unit 150 Output unit 160 Shooting control unit 170 Storage unit 180 Acquisition unit 200 Detection device 210 Receiving means 220 Extraction means 230 Classification means 240 Output means 911 Processor 912 Memory 913 Input / Output interface 914 Communication interface

Claims

1. The system comprises: receiving means for receiving a captured image of a monitoring area; extraction means for extracting an image portion containing a moving object from the captured image; classification means for classifying whether the moving object is one of the predetermined detection targets or not, using image features extracted from the image portion using a pre-trained image feature extraction model; and output means for outputting detection information indicating that a detection target moving object is visible in the captured image, wherein the image feature extraction model is a model that takes the image portion containing the moving object as input and outputs image features of the input image portion. The image feature extraction model is characterized in that it is trained to use an image portion containing a moving object and a background image that does not contain the moving object in each of a plurality of captured images taken at different times as training image portions, such that the background feature quantities extracted from each of the background images of the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images of the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

2. The detection device according to claim 1, characterized in that the output means outputs the result of classifying the moving object as detection information to a display device.

3. The detection device according to claim 1, further comprising tracking means for tracking the moving object in a plurality of captured images, wherein the output means outputs information indicating the trajectory of the moving object as detection information by the tracking means.

4. The detection device according to claim 1, wherein the output means further includes a function to control the display operation of a display device that displays the detection information, and the display device displays the captured image and displays the detection information superimposed on the captured image.

5. The detection device according to claim 1, further comprising acquisition means for acquiring character information describing the detection information from a large-scale language model, wherein the output means further outputs the character information.

6. The detection device according to claim 1, characterized in that the image feature extraction model is further trained using separate training image portions each containing different types of moving objects, such that the moving object features extracted from each moving object image in the plurality of training image portions approach each other in the feature space for each type of moving object, and the moving object features extracted from images of different types of moving objects move away from each other in the feature space.

7. The detection device according to claim 1, characterized in that the image feature extraction model is trained using background images of different environmental conditions generated by AI (Artificial Intelligence) technology as the background images.

8. A detection system comprising: a detection device according to claim 1; an imaging device for transmitting the captured image to the detection device; and a display device for displaying the detection information of the detection device.

9. A detection method characterized in that a computer receives an image of a monitored area, extracts an image portion containing a moving object from the image, uses a pre-trained image feature extraction model to classify whether the moving object is one of the predetermined detection targets or not, outputs detection information indicating that a detection target moving object is visible in the image, the image feature extraction model is a model that takes the image portion containing the moving object as input and outputs the image feature of the input image portion, and the image feature extraction model is trained to use an image portion containing a moving object and a background image that does not contain a moving object in each of a plurality of images taken at different times as training image portions, such that the background feature quantities extracted from each of the background images of the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images of the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

10. The detection method according to claim 9, characterized in that, in the output step, the result of classifying the moving object is output to a display device as detection information.

11. The detection method according to claim 9, further comprising the step of tracking the moving object in a plurality of the captured images, wherein in the output step, information indicating the trajectory of the moving object as a result of the tracking step is output as detection information.

12. The detection method according to claim 9, further comprising a function to control the display operation of a display device that displays the detection information in the output step, wherein the display device displays the captured image and displays the detection information superimposed on the captured image.

13. The detection method according to claim 9, further comprising the step of obtaining character information describing the detection information from a large-scale language model, wherein the character information is further output in the output step.

14. The detection method according to claim 9, characterized in that the image feature extraction model is further trained using separate training image portions each containing different types of moving objects, such that the moving object features extracted from each moving object image in the plurality of training image portions approach each other in the feature space for each type of moving object, and the moving object features extracted from images of different types of moving objects move away from each other in the feature space.

15. The computer is made to perform the following: receiving process to receive an image of the monitored area; extraction process to extract an image portion containing a moving object from the image; classification process to classify whether the moving object is one of the predetermined detection targets or not, using the image features extracted from the image portion using a pre-trained image feature extraction model; and output process to output detection information indicating that a detection target moving object is visible in the image. The image feature extraction model is a model that takes the image portion containing the moving object as input and outputs the image features of the input image portion. The image feature extraction model is a computer program characterized by being trained such that, using an image portion containing a moving object and a background image not containing the moving object in each of a plurality of captured images taken at different times as training image portions, the background feature quantities extracted from each of the background images in the plurality of training image portions approach each other in the feature space, the moving object feature quantities extracted from each of the moving object images in the training image portions approach each other in the feature space, and the background feature quantities and the moving object feature quantities move apart in the feature space.

16. The computer program according to claim 15, characterized in that the output processing outputs the result of classifying the moving object as detection information to a display device.

17. The computer program according to claim 15, further comprising a tracking process for tracking the moving object in a plurality of the captured images, wherein the output process outputs information indicating the trajectory of the moving object as a result of the tracking process as detection information.

18. The computer program according to claim 15, wherein the output processing further includes a function to control the display operation of a display device that displays the detection information, and the display device displays the captured image and displays the detection information superimposed on the captured image.

19. The computer program according to claim 15, further comprising an acquisition process for acquiring character information describing the detection information from a large-scale language model, wherein the output process further outputs the character information.

20. The computer program according to 15, characterized in that the image feature extraction model is further trained using separate training image portions each containing different types of moving objects, such that the moving object features extracted from each moving object image in the plurality of training image portions approach each other in the feature space for each type of moving object, and the moving object features extracted from images of different types of moving objects move away from each other in the feature space.