Image plane region detection method based on edge information
By extracting multi-level features using ResNet and FPN networks and performing resolution-adaptive fusion, the problem of inaccurate detection of small-sized planes is solved, thus improving the accuracy and effectiveness of image plane analysis tasks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANKAI UNIV
- Filing Date
- 2023-03-21
- Publication Date
- 2026-06-23
Smart Images

Figure CN116258943B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision technology, specifically relating to an image planar region detection method based on edge information. Background Technology
[0002] Planar regions within a scene provide crucial information for a wide range of vision-based applications, including computer vision, stereo vision, and robot vision. Planar region detection, aiming to enable machines to perceive high-level scene structures like humans, is a significant challenge in fields such as computer vision and pattern recognition. After extracting all planes from a single image, users can select those planes of interest and design effective and engaging applications based on these planar regions. For example, users can decorate walls with their favorite textures, or advertisers can leverage less informative areas (such as tables, walls, and planks) in promotional videos to more effectively market their products. Furthermore, planar features are also key clues for autonomous robots to perceive their surroundings and build maps from camera views.
[0003] Unlike traditional object detection and segmentation, planar detection is more constrained and challenging. This is because, firstly, planes lack predefined class assumptions and require segmentation of planes of arbitrary class; secondly, the boundaries of planar regions are difficult to define due to the high-level abstraction of structural information within the scene, such as plane normals. With the rise of deep neural networks, analyzing planar regions has become possible due to the ability of convolutional neural networks to learn high-level feature representations of images. The 2019 CVPR papers "Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding" by Yu et al. (1029-1037) and "PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image" by Liu et al. (4450-4459) designed CNN-based architectures for analyzing planar regions. The former employs a two-stage bottom-up network architecture, while the latter utilizes MaskR-CNN to generate an arbitrary number of planes, while simultaneously designing a refinement module to integrate features from all planes to further refine the prediction results. However, due to the challenge of plane class agnosticness, the segmentation masks predicted by these methods are not accurate, and small-sized planes are difficult to detect.
[0004] Recently, edge information has been proven to effectively help models learn more discriminative features in salient object detection, scene segmentation, and parsing. Li et al.'s 2021 paper, "Self-correction for human parsing," published at TPAMI, constructed an additional edge network to estimate object edges and proposed a self-correction strategy to remove incorrect labels. However, it only optimizes features in edge regions, which account for a very small percentage (approximately 1%) of the image. Summary of the Invention
[0005] The purpose of this invention is to improve upon the shortcomings of existing planar region detection methods by providing an image planar region detection method based on edge information, which can enhance the performance of downstream image planar analysis tasks.
[0006] This invention is achieved through the following technical solution:
[0007] A method for detecting planar regions in an image based on edge information includes the following steps:
[0008] Step 1: Use a neural network to extract the basic skeleton network features of each layer of the image;
[0009] Step 2: Extract multi-level contextual features from the features obtained in Step 1 through upsampling and inter-layer links;
[0010] Step 3: Extract multi-level edge features of the image by iteratively optimizing the network layers using the intermediate-level features obtained in Step 1, and optimize the features of different levels to the expected feature dimensions.
[0011] Step 4: Perform layer-by-layer pairwise fusion of the multi-level context features obtained in Step 2 and the multi-level edge features obtained in Step 3, while keeping the resolution of the context features and edge features with different resolutions the same.
[0012] Step 5: Provide the fused features to the target model of the downstream image plane detection task to be trained, so as to improve the performance of the target model in the downstream image plane detection task.
[0013] In the above technical solution, step 1 uses a ResNet network structure, and the feature representation of the output of each layer is as follows:
[0014] In the above technical solution, in step 2, the FPN network is used to assist the ResNet neural network, and the basic skeleton network features of multiple layers are extracted by upsampling and inter-layer links.
[0015] Using Py i(·)(i=2,3,4,5,6) represents the function at different levels of the FPN network, using The output of each layer of the FPN network is represented by the following method, which processes the output of each layer of the ResNet neural network to obtain the contextual features of the image output of each layer of the FPN network.
[0016]
[0017] Up represents upsampling.
[0018] In the above technical solution, step 3 involves selecting features from the three intermediate layers output by the ResNet network. Multi-level edge features of the image are extracted by iteratively optimizing the network layers, and the features of the three intermediate layers mentioned above are combined. As input, the data is fed into an iterative optimization network layer consisting of multiple channel smoothing reduction modules. The iterative optimization network layer smoothly reduces the number of channels of the input features by a factor of 2 until the total number of channels of all input features is reduced to 256, thereby obtaining the edge features.
[0019] In the above technical solution, the multi-level context features obtained in step 2 and the multi-level edge features obtained in step 3 are sent to the resolution adaptive module for layer-by-layer pair fusion.
[0020] The resolution adaptive module is composed of a set of adaptive convolutional kernels, and the specific process is defined as follows:
[0021]
[0022]
[0023]
[0024] Tr4 = Conv1(·)
[0025] Tr i represents the pairwise fusion operation of the i-th level, which is implemented by combining convolutional layers; Conv3(·) and Conv1(·) represent convolutional operations with kernel sizes of 3 and 1, respectively; ° represents the combination operation of multiple convolutional layers.
[0026] In the above technical solution, in step 5, when training the target model, the hybrid loss of the target model needs to be calculated. The hybrid loss includes two items: one is the edge loss, and the other is the loss of the original image planar region detection model. The original image planar region detection model refers to the neural network used in step 1.
[0027] The present invention also provides a computer-readable storage medium storing a computer program that, when executed, implements the steps of the method described above.
[0028] The advantages and beneficial effects of this invention are as follows:
[0029] This invention first utilizes a multi-level convolutional neural network to extract edge and contextual features. The multi-level convolutional network extracts two types of planar features at different resolutions. The designed recurrent convolutional layers iteratively optimize the receptive field of the features according to different resolutions. Next, this invention performs feature fusion. Guided by an innovative resolution-adaptive fusion operation and planar edge supervision, edge and contextual features from different levels are aggregated into multi-level planar features and provided to downstream arbitrary image planar analysis models, thereby improving the performance of the target model in downstream image planar analysis tasks. Experimental results show that this invention can effectively improve model performance in numerous image planar analysis tasks. Attached Figure Description
[0030] Figure 1 This is a flowchart of the image planar region detection method based on edge information of the present invention.
[0031] Figure 2 This invention uses the PlaneRCNN planar region detection model as the target model and visualizes the three discriminative features extracted.
[0032] Figure 3 This invention provides a visual representation of the prediction results in the downstream task of plane segmentation in plane analysis.
[0033] Figure 4 This is a visualization of the prediction results of the present invention in the downstream task depth estimation of planar analysis.
[0034] Figure 5 This invention provides a visual representation of the prediction results in 3D reconstruction of downstream tasks in planar analysis.
[0035] For those skilled in the art, other related figures can be obtained from the above figures without any creative effort. Detailed Implementation
[0036] To enable those skilled in the art to better understand the present invention, the technical solution of the present invention will be further described below with reference to specific embodiments.
[0037] A method for detecting planar regions in an image based on edge information, see appendix. Figure 1 This includes the following steps:
[0038] Step 1: Use ResNet neural network to extract basic skeleton network features at each level of the image.
[0039] ResNet is a common multi-scale neural network architecture, which contains 5 blocks, each defined as a block according to its layer. i (i = 1, 2, ..., 5), each block corresponds to a level, and the output feature of each block (i.e., each level) is represented as...
[0040] Step 2: Use FPN (Feature Pyramid Networks) to assist the ResNet neural network, extracting multi-level contextual features from the basic skeleton network features of multiple layers through upsampling and inter-layer connections.
[0041] Using Py i (·)(i=2,3,4,5,6) represents the function at different levels of the FPN network, using The output of each layer of the FPN network is represented by the following method, which processes the output of each layer of the ResNet neural network to obtain the contextual features of the image output of each layer of the FPN network.
[0042]
[0043] Up represents upsampling.
[0044] Step 3: Extract multi-level edge features.
[0045] To reduce computational costs, multi-level edge feature extraction shares the basic skeleton network features extracted by ResNet at each level. Since the receptive field of lower layers is small, edge details in higher layers may be severely lost. Therefore, this invention selects features from the three intermediate levels output by the ResNet network. Multi-level edge features of an image are extracted by iteratively optimizing network layers, and features at different levels are optimized to the expected feature dimensions.
[0046] The features of the above three intermediate levels As input, the data is fed into an iterative optimization network layer consisting of multiple channel smoothing reduction modules. The iterative optimization network layer smoothly reduces the number of channels of the input features by a factor of 2 until the total number of channels of all input features is reduced to 256, thereby obtaining the edge features. This process is described as follows:
[0047]
[0048] Ed1 represents the iterative optimization network layer, which optimizes edge features multiple times until their feature dimensions meet the predefined feature dimensions (in this embodiment, the feature dimension is 256, but it can also be other dimensions).
[0049] Step 4: In order to detect image planes, especially small image planes, it is necessary to pay attention to both the edges and the main body region of the image plane. Since the context features obtained in Step 2 represent the main body region of the image plane and the edge features obtained in Step 3 represent the edge region of the image plane, this invention sends the multi-level context features obtained in Step 2 and the multi-level edge features obtained in Step 3 into the resolution adaptive module for layer-by-layer pair fusion, and keeps the resolution of the context features and edge features with different resolutions the same.
[0050] The resolution adaptive module is composed of a set of adaptive convolutional kernels, and the specific process is defined as follows:
[0051]
[0052]
[0053]
[0054] Tr4 = Conv1(·)
[0055] Tr i represents the pairwise fusion operation of the i-th level, which is implemented by combining convolutional layers; Conv3(()) and Conv1(()) represent convolutional operations with kernel sizes of 3 and 1, respectively; ° represents the combination operation of multiple convolutional layers.
[0056] Step 5: Through the above steps, this invention fuses edge features and contextual features from different levels into five levels of planar features. Next, the fused planar features of each level are provided to the target model for the downstream image planar detection task to be trained, thereby improving the performance of the target model in the downstream image planar detection task. Since the dimension and size of each fused planar feature are the same as those generated by the target model to be trained (in this embodiment, the ResNet model), the newly fused features can be easily fed into the existing target model.
[0057] Appendix Figure 2 This invention uses the PlaneRCNN planar region detection model as the target model and visualizes the three discriminative features extracted.
[0058] The downstream image plane detection task can be: plane segmentation, image depth estimation, image 3D plane reconstruction, etc. (See appendix) Figure 3 - Appendix Figure 5 The results of plane segmentation, image depth estimation, and image 3D plane reconstruction prediction using the method of this invention are presented respectively.
[0059] Furthermore, when using the method proposed in this invention, it is necessary to calculate the hybrid loss of the target model. The hybrid loss typically includes two terms: an edge loss that helps learn the model's boundary information and the loss of the original image planar region detection model (the original image planar region detection model refers to the network model used in step 1). Formally, this hybrid loss is defined as:
[0060]
[0061] here, Indicates marginal loss. The loss represents the detection loss of planar regions in the original image, and the ratio between the two is expressed using the hyperparameter λ. e and λ d Adjustments are made. For edge loss, to improve the performance of downstream modules, it is desirable for the model to learn edge information of the plane. Specifically, this loss should explicitly preserve the edges of the real planar regions, defined as follows:
[0062]
[0063] here, represents the value of the i-th pixel in the predicted edge map, N represents the number of pixels in the edge map, N+ represents the number of edge region pixels in the real plane region, and N- represents the number of non-edge region pixels in the real plane region.
[0064] The loss for planar region detection is calculated based on the specific planar region detection method used.
[0065] The present invention has been described above by way of example. It should be noted that any simple modifications, alterations or other equivalent substitutions that can be made by those skilled in the art without creative effort without departing from the core of the present invention fall within the protection scope of the present invention.
Claims
1. A method for detecting planar regions in an image based on edge information, characterized in that, Includes the following steps: Step 1: Use a ResNet neural network to extract the basic skeleton network features of each layer of the image; the feature output of each layer is represented as follows: ; Step 2: Extract multi-level contextual features from the features obtained in Step 1 through upsampling and inter-layer links; Using FPN network to assist ResNet neural network, multi-level contextual features are extracted from the basic skeleton network features of multiple layers through upsampling and inter-layer connections; use Functions representing different levels of an FPN network are used. The output of each layer of the FPN network is represented by the following method, which processes the output of each layer of the ResNet neural network to obtain the contextual features of the image output of each layer of the FPN network. ; Represents upsampling; Step 3: Extract multi-level edge features of the image by iteratively optimizing the network layers using the intermediate-level features obtained in Step 1, and optimize the features of different levels to the expected feature dimensions. Step 4: Merge the multi-level context features obtained in Step 2 and the multi-level edge features obtained in Step 3 in pairs, and keep the resolution of the context features and edge features with different resolutions the same. Step 5: Provide the fused features to the target model of the downstream image plane detection task to be trained, so as to improve the performance of the target model in the downstream image plane detection task.
2. The image planar region detection method based on edge information according to claim 1, characterized in that: In step 3, features from the three intermediate layers output by the ResNet network are selected. Multi-level edge features of the image are extracted by iteratively optimizing the network layers, and the features of the three intermediate layers mentioned above are combined. As input, the data is fed into an iterative optimization network layer consisting of multiple channel smoothing reduction modules. The iterative optimization network layer smoothly reduces the number of channels of the input features by a factor of 2 until the total number of channels of all input features is reduced to 256, thus obtaining the edge features. .
3. The image planar region detection method based on edge information according to claim 2, characterized in that: The multi-level context features obtained in step 2 and the multi-level edge features obtained in step 3 are fed into the resolution adaptive module for pairwise fusion at each level. The resolution adaptive module is composed of a set of adaptive convolutional kernels, and its process is defined as follows: The pairwise fusion operation representing the i-th level is implemented using a combination of convolutional layers; and These represent convolution operations with kernel sizes of 3 and 1, respectively. This represents a combined operation of multiple convolutional layers.
4. The image planar region detection method based on edge information according to claim 1, characterized in that: Step 5: When training the target model, the hybrid loss of the target model needs to be calculated. The hybrid loss includes two terms: one is the edge loss, and the other is the loss of the original image planar region detection model. The original image planar region detection model refers to the neural network used in step 1.
5. A computer-readable storage medium, characterized in that, The device contains a computer program that, when executed, implements the steps of the method as described in any one of claims 1 to 4.