Region detection device, region detection method, and program
The region detection device addresses the challenge of inaccurate object detection by employing region extraction and cleansing methods to ensure precise identification of target objects, improving detection accuracy and reducing data correction costs.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2024-12-09
- Publication Date
- 2026-06-19
Smart Images

Figure 2026100180000001_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to a technique for detecting an object region from an image.
Background Art
[0002] In recent years, object detection techniques for detecting objects shown in video have been proposed and utilized in various fields. For example, Patent Document 1 proposes a region extraction device capable of extracting an image region of a subject even under shooting conditions accompanied by rapid illumination changes.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] However, even with Patent Document 1, it is not always possible to accurately detect the region of the target object. ]
[0005] One object of this disclosure is to provide a region detection device capable of accurately detecting the region of a target object.
Means for Solving the Problems
[0006] In one aspect of this disclosure, the region detection device includes: region extraction means for extracting a foreground region representing a region that has changed between images observed at a fixed point at a plurality of times; object extraction means for extracting an object region representing the region of an object included in the image; cleansing means for extracting a region that satisfies a criterion for determining that the degree of overlap is high, based on the degree of overlap between the foreground region and the object region; and includes the above.
[0007] In other aspects of this disclosure, the region detection method is: A region detection method performed by a computer, Extract the foreground region that represents the area that changed between images taken at a fixed point at multiple time points. Extract the object region representing the area of an object contained in the aforementioned image, The criteria for determining a high degree of overlap are used to extract regions where the degree of overlap between the foreground region and the object region satisfies the criteria.
[0008] In yet another aspect of this disclosure, the program is Extract the foreground region that represents the area that changed between images taken at a fixed point at multiple time points. Extract the object region representing the area of an object contained in the aforementioned image, The computer is instructed to perform a process that extracts regions where the degree of overlap between the foreground region and the object region satisfies the criteria for determining a high degree of overlap. [Effects of the Invention]
[0009] According to this disclosure, it becomes possible to accurately detect the region of a target object. [Brief explanation of the drawing]
[0010] [Figure 1] This shows the overall configuration of the object detection system. [Figure 2] This is a block diagram showing the hardware configuration of the object region detection device related to this disclosure. [Figure 3] This block diagram shows the functional configuration of the object region detection device related to this disclosure. [Figure 4] An example of foreground object candidate extraction is shown. [Figure 5] An example of a cleansing process is shown. [Figure 6] An example of foreground object extraction is shown. [Figure 7] This is a flowchart of the processing performed by the object area detection device relating to this disclosure. [Figure 8] This block diagram shows the functional configuration of the area detection device related to this disclosure. [Figure 9] This is a flowchart of the processing by the area detection device according to the present disclosure.
Mode for Carrying Out the Invention
[0011] Hereinafter, preferred embodiments of the present disclosure will be described with reference to the drawings.
[0012] <First Embodiment> [Overall Configuration] FIG. 1 shows the overall configuration of an object detection system to which the area detection device according to the present disclosure is applied. The object detection system 1 includes a fixed-point camera 5 and an object area detection device 10. The fixed-point camera 5 and the object area detection device 10 can communicate with each other by wire or wirelessly. The object area detection device 10 is an example of an area detection device.
[0013] The fixed-point camera 5 is installed in the store and photographs the product shelves at predetermined time intervals. Then, the fixed-point camera 5 transmits an image of the product shelf photographed by the imaging device installed at a certain location (hereinafter, also referred to as a "fixed-point observation image") to the object area detection device 10. The foreground represents an area where a difference occurs in the images at two time points. The foreground object is an object reflected in that area or represents that area. Hereinafter, for convenience of explanation, the products displayed on the product shelf are also referred to as "foreground" or "foreground object", and the product shelf is also referred to as "background".
[0014] The object area detection device 10 detects a changed area (e.g., a product that has entered or exited) based on the temporal change of the fixed-point observation images. Further, the object area detection device 10 assigns correct positions and sizes to the products in the fixed-point observation images and creates learning data for the object detector. This makes it possible to efficiently create learning data. In the above-described example of the product shelf, for example, since the update of the learning data associated with the introduction or rearrangement of new products in the store becomes easy, it is possible to quickly respond even when new products are introduced or the product arrangement is changed. Although details will be described later, the object area detection device 10 of the present embodiment can perform the above-described processing by combining the background difference method and the region division method.
[0015] [Hardware Configuration] FIG. 2 is a block diagram showing the hardware configuration of the object area detection device 10 according to the first embodiment. As shown in the figure, the object area detection device 10 includes an interface (I / F) 11, a processor 12, a memory 13, a recording medium 14, and a database (DB) 15.
[0016] The I / F 11 performs data input / output with an external device. Specifically, the I / F 11 acquires a fixed-point observation image from the fixed-point camera 5.
[0017] The processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire object area detection device 10 by executing a pre-prepared program. Note that the processor 12 may be a GPU (Graphics Processing Unit), a DSP (Digital Signal Processor), an MPU (Micro Processing Unit), an FPU (Floating Point number Processing Unit), a PPU (Physics Processing Unit), a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof. The processor 12 executes the learning data creation process described later.
[0018] Memory 13 consists of ROM (Read Only Memory), RAM (Random Access Memory), and other components. Memory 13 is also used as working memory while the processor 12 is executing various processes.
[0019] The recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the object area detection device 10. The recording medium 14 stores various programs that the processor 12 executes. When the object area detection device 10 performs various processes, the programs stored in the recording medium 14 are loaded into the memory 13 and executed by the processor 12. The DB 15 stores the fixed-point observation images received from the fixed-point camera 5. The DB 15 also stores the created training data.
[0020] In addition to the above, the object area detection device 10 may also be equipped with a display device such as a liquid crystal display, and an input device such as a keyboard or mouse. These display devices and input devices are used, for example, by the administrator of the object area detection device 10 to perform necessary management.
[0021] [Functional Configuration] Figure 3 is a block diagram showing the functional configuration of the object region detection device 10 of the first embodiment. Functionally, the object region detection device 10 comprises a foreground object candidate extraction unit 101, an object candidate extraction unit 102, a cleansing unit 103, and a foreground object extraction unit 104.
[0022] The object region detection device 10 receives fixed-point observation images from the fixed-point camera 5 via the I / F 11. The fixed-point observation images are stored in DB 15.
[0023] The foreground object candidate extraction unit 101 acquires fixed-point observation images from DB15. The foreground object candidate extraction unit 101 compares two fixed-point observation images that are sequentially separated in time and extracts candidate foreground objects (hereinafter also referred to as "foreground object candidates"). The foreground object candidate extraction unit 101 outputs the extraction results of foreground object candidates to the cleansing unit 103. A foreground object candidate is an example of a foreground region.
[0024] Figure 4 shows an example of foreground object candidate extraction. Figure 4 includes fixed-point observation images at an arbitrary time t and at time t+i, i steps later. In the fixed-point observation image at time t+i, product 41 has been added. The foreground object candidate extraction unit 101 compares the fixed-point observation image at time t with the fixed-point observation image at time t+i using the background subtraction method and extracts the changed region as a foreground object candidate. For comparison, for example, differences in RGB values, changes in image features, and changes in depth can be used. In Figure 4, the foreground object candidate extraction unit 101 extracts foreground object candidate 42a at time t and foreground object candidate 42b at time t+i as foreground object candidates. The foreground object candidate extraction unit 101 outputs the time of the fixed-point observation image and the position and size information of the foreground object candidate corresponding to that time to the cleansing unit 103 as the foreground object candidate extraction result. In Figure 4, foreground object candidates are shown as rectangles, but the foreground object candidate extraction unit 101 may also mask the changed region and extract the masked region as a foreground object candidate.
[0025] The foreground object candidate extraction unit 101 performs the above processing from the start to the end of capturing fixed-point observation images and extracts foreground object candidates.
[0026] The object candidate extraction unit 102 acquires fixed-point observation images from DB15. The object candidate extraction unit 102 extracts object candidates (hereinafter also referred to as "object candidates") from the fixed-point observation images using the region segmentation method. The object candidate extraction unit 102 extracts object candidates from the fixed-point observation images regardless of whether they are in the foreground or background. The object candidate extraction unit 102 outputs the object candidate extraction results to the cleansing unit 103 and the foreground object extraction unit 104. An object candidate is an example of an object region.
[0027] Specifically, the object candidate extraction unit 102 inputs the fixed-point observation image and prompts into an existing segmentation model to perform foreground and background segmentation in the fixed-point observation image. Examples of segmentation models include the SAM (Segment Anything Model) published by Meta. The prompts include, for example, multiple points. The object candidate extraction unit 102 outputs the segmented fixed-point observation image to the cleansing unit 103 and the foreground object extraction unit 104.
[0028] The cleansing unit 103 performs a cleansing process on the foreground object candidates and outputs the cleansed foreground object candidates to the foreground object extraction unit 104.
[0029] Specifically, the cleansing unit 103 calculates the degree of overlap between the rectangle or mask of the foreground object candidate and the object candidate. The foreground object candidate and the object candidate are extracted from fixed-point observation images taken at the same time. The cleansing unit 103 then designates object candidates whose degree of overlap with the foreground object candidate is above a predetermined threshold as cleansed foreground object candidates. For example, IoU (Intersection over Union) can be used as an index to indicate the degree of image overlap.
[0030] Figure 5 shows an example of the cleansing process performed by the cleansing unit 103. In Figure 5, foreground object candidates input from the foreground object candidate extraction unit 101 are shown by dotted lines, and multiple object candidates input from the object candidate extraction unit 102 are shown by solid lines. The cleansing unit 103 calculates the degree of overlap between the foreground object candidate 42b and the multiple object candidates at time t+i, and selects object candidates 51 whose degree of overlap with foreground object candidate 42b is greater than or equal to a predetermined threshold as cleansed foreground object candidates. Note that at time t (before product addition), there are no object candidates whose degree of overlap with foreground object candidate 42a is greater than or equal to a predetermined threshold, so the cleansing unit 103 does not acquire cleansed foreground object candidates.
[0031] Foreground object candidates extracted by background subtraction may include background elements other than the foreground objects due to noise and other influences. The cleansing unit 103 can remove the background elements included in the foreground object candidates by performing a cleansing process on them. Furthermore, object candidates extracted by segmentation may not have the correct extraction units due to over-subdivision or merging. The cleansing unit 103 can obtain object candidates of the desired units by extracting object candidates whose degree of overlap with the foreground object candidates is above a predetermined threshold.
[0032] The cleansing unit 103 performs the above processing on foreground object candidates extracted between the start and end of fixed-point observation image capture. Note that the foreground object candidate extraction unit 101 does not extract foreground objects that do not change (i.e., products that do not enter or exit), so the cleansed foreground object candidates may not include all foreground objects in the fixed-point observation image. Therefore, the foreground object extraction unit 104, described later, performs processing to extract foreground objects that do not change.
[0033] The foreground object extraction unit 104 extracts foreground objects included in the fixed-point observation image. Then, the foreground object extraction unit 104 creates training data based on the foreground object extraction results and outputs it to DB15.
[0034] Specifically, the foreground object extraction unit 104 calculates the similarity between the cleansed foreground object candidates and the object candidates, and determines that object candidates whose similarity to the cleansed foreground object candidates is above a threshold are foreground objects. For example, the foreground object extraction unit 104 uses a pre-trained image recognition model or SAM to extract the feature vectors of the cleansed foreground object candidates and the feature vectors of the object candidates, respectively. Then, the foreground object extraction unit 104 calculates the cosine similarity between the feature vectors of the cleansed foreground object candidates and the feature vectors of the object candidates, and determines that object candidates whose cosine similarity is above a predetermined threshold are foreground objects.
[0035] The foreground object extraction unit 104 may also determine the foreground object based on the similarity obtained by feature point matching. The foreground object extraction unit 104 uses feature point detection algorithms such as AKAZE (Accelerated KAZE) or ORB (Oriented FAST and Rotated BRIEF) to detect the feature points of the cleansed foreground object candidate and the feature points of the object candidate, respectively. The foreground object extraction unit 104 then performs feature point matching to obtain the similarity between the cleansed foreground object candidate and the object candidate.
[0036] Figure 6 shows an example of foreground object extraction by the foreground object extraction unit 104. In Figure 6, the foreground object extraction unit 104 calculates the similarity between the cleansed foreground object candidate 61 and multiple object candidates included in the fixed-point observation image 62, and determines that object candidates 62a to 62d whose similarity to the cleansed foreground object candidate 61 is above a predetermined threshold are foreground objects. The foreground object extraction unit 104 then assigns correct labels to the foreground objects in the fixed-point observation image and creates training data. The correct labels are, for example, information indicating the position and size of the foreground objects included in the fixed-point observation image (such as bounding boxes).
[0037] In object detection, creating a highly accurate object detector requires a large amount of training data, and preparing this data is costly. In particular, if there are many objects in a single image, the cost of correcting the training data increases accordingly. Also, when using an object detector for monitoring merchandise in a store, correcting the data must be done every time new merchandise arrives. The foreground object extraction unit 104 automatically corrects the data for products that appear in fixed-point observation images, thus enabling the creation of training data at a low cost.
[0038] As described above, the object region detection device 10 can accurately detect the region of a foreground object by combining the background subtraction method and the region segmentation method, and can also perform correct identification of foreground objects in fixed-point observation images.
[0039] In the above configuration, the foreground object candidate extraction unit 101 is an example of a region extraction means, the object candidate extraction unit 102 is an example of an object extraction means, the cleansing unit 103 is an example of a cleansing means, and the foreground object extraction unit 104 is an example of a foreground object extraction means.
[0040] [Object region detection process] Next, we will explain the process for detecting the region of the foreground object as described above. Figure 7 is a flowchart of the object region detection process by the object region detection device 10. This process is realized by the processor 12 shown in Figure 2 executing a pre-prepared program and operating as each element shown in Figure 3.
[0041] First, the foreground object candidate extraction unit 101 acquires fixed-point observation images from DB15. The foreground object candidate extraction unit 101 compares two fixed-point observation images that are in a time order and extracts foreground object candidates (step S101). The foreground object candidate extraction unit 101 outputs the extraction results of foreground object candidates to the cleansing unit 103.
[0042] Next, the object candidate extraction unit 102 acquires fixed-point observation images from DB15. The object candidate extraction unit 102 extracts object candidates from the fixed-point observation images using a segmentation model (step S102). The object candidate extraction unit 102 outputs the object candidate extraction results to the cleansing unit 103 and the foreground object extraction unit 104.
[0043] Next, the cleansing unit 103 calculates the degree of overlap between the foreground object candidates and the object candidates, and generates cleaned foreground object candidates based on the degree of overlap (step S103). The cleansing unit 103 outputs the cleaned foreground object candidates to the foreground object extraction unit 104.
[0044] Next, the foreground object extraction unit 104 calculates the similarity between the cleansed foreground object candidates and the object candidates, and extracts foreground objects included in the fixed-point observation image based on the similarity (step S104). The foreground object extraction unit 104 outputs the foreground object extraction results as ground truth data to DB15. Then the process ends.
[0045] [Differentiation] Next, a modified version of the first embodiment will be described. The following modifications can be combined as appropriate and applied to the first embodiment.
[0046] (Variation 1) The object area detection device 10 may output the cleansed foreground object candidates generated by the cleansing unit 103 to a display device or terminal device. This allows store employees to check the changed areas on the shelves (i.e., products that have entered or left) via the display device or terminal device.
[0047] (Modification 2) The object region detection device 10 may output all foreground objects extracted by the foreground object extraction unit 104 as analysis results of the fixed-point observation image to a display device or terminal device. This allows store employees to check the display status of products on the shelves via the display device or terminal device.
[0048] (Variation 3) In the above embodiment, the object region detection device 10 creates training data for object detection targeting one class, but it may also create training data for object detection targeting multiple classes. In this case, the correct label includes information indicating the position and size of the foreground object and information indicating the category of the foreground object. The object region detection device 10 can obtain information indicating the category of the foreground object by predicting the category of the foreground object using a pre-prepared classification model. An object detector trained with the above training data can detect products included in an image and the category of those products.
[0049] [Examples of application] The area detection device of this disclosure can be used as a product monitoring device to monitor the entry and exit of products and the display of products. In addition to being a product monitoring device, the area detection device of this disclosure can also be used as a vehicle monitoring device. For example, the area detection device of this disclosure can monitor each vehicle based on images captured by fixed-point cameras installed on highways. Furthermore, the area detection device of this disclosure can also be used as a training data generation device to create training data for object detectors.
[0050] <Second Embodiment> Figure 8 is a block diagram showing the functional configuration of the region detection device according to the second embodiment. The region detection device 20 comprises a region extraction means 201, an object extraction means 202, and a cleansing means 203.
[0051] Figure 9 is a flowchart of the processing performed by the region detection device of the second embodiment. The region extraction means 201 extracts foreground regions that represent regions that have changed between images observed at fixed points at multiple time points (step S201). The object extraction means 202 extracts object regions that represent regions of objects included in the image (step S202). The cleansing means 203 extracts regions where the degree of overlap between the foreground region and the object region satisfies the criteria for determining a high degree of overlap (step S203).
[0052] According to the region detection device of the second embodiment, it is possible to accurately detect the region of a target object.
[0053] Although the present disclosure has been described above with reference to embodiments and examples, the present disclosure is not limited to the above embodiments and examples. Various modifications to the structure and details of the present disclosure can be understood by those skilled in the art within the scope of the present disclosure. [Explanation of Symbols]
[0054] 10. Object Area Detection Device 15. Database (DB) 101 Foreground object candidate extraction unit 102 Object candidate extraction unit 103 Cleansing Section 104 Foreground object extraction part
Claims
1. A region extraction means for extracting a foreground region that represents a region that has changed between images observed at a fixed point at multiple time points, An object extraction means for extracting an object region representing the area of an object included in the aforementioned image, A cleansing means that extracts regions where the degree of overlap between the foreground region and the object region satisfies the criteria for determining a high degree of overlap, A region detection device equipped with the following features.
2. The region detection device according to claim 1, further comprising foreground object extraction means for extracting a foreground object based on the object region and the region extracted by the cleansing means.
3. The region detection device according to claim 2, wherein the foreground object extraction means calculates the similarity between the object region and the region extracted by the cleansing means, and extracts the object region whose similarity is equal to or greater than a predetermined threshold as a foreground object.
4. The region detection device according to claim 3, wherein the foreground object extraction means extracts a feature vector of the object region and a feature vector of the region extracted by the cleansing means, and calculates the similarity.
5. The region detection device according to claim 3, wherein the foreground object extraction means compares the feature points of the object region with the feature points of the region extracted by the cleansing means and calculates the similarity.
6. The region detection device according to claim 2, further comprising a learning data generation means for generating learning data based on the foreground objects extracted by the foreground object extraction means.
7. The cleansing means is an area detection device according to claim 1, which uses IoU as an indicator of the degree of overlap.
8. A region detection method performed by a computer, Extract the foreground region that represents the area that changed between images taken at a fixed point at multiple time points. Extract the object region representing the area of an object contained in the aforementioned image, A region detection method that extracts regions where the degree of overlap between the foreground region and the object region satisfies the criteria for determining a high degree of overlap.
9. Extract the foreground region that represents the area that changed between images taken at a fixed point at multiple time points. Extract the object region representing the area of an object contained in the aforementioned image, A program that causes a computer to perform a process of extracting regions where the degree of overlap between the foreground region and the object region satisfies the criteria for determining a high degree of overlap.