A target detection method based on feature grafting

By combining and fusing features from Transform and CNN networks, the problem of lacking global or local information in existing technologies is solved, achieving more accurate target detection and localization.

CN117315229BActive Publication Date: 2026-06-26HUAIYIN INSTITUTE OF TECHNOLOGY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAIYIN INSTITUTE OF TECHNOLOGY
Filing Date
2023-09-22
Publication Date
2026-06-26

Smart Images

  • Figure CN117315229B_ABST
    Figure CN117315229B_ABST
Patent Text Reader

Abstract

The application discloses a target detection method based on feature grafting, which comprises the following steps: (1) using a camera to shoot an object to be detected to construct a data set; (2) constructing a feature grafting fusion network; (3) inputting an object image to be detected into a main network Transform and a sub-network CNN respectively to obtain two groups of feature maps; (4) inputting the obtained object image to be detected into a feature grafting module to obtain feature maps M1, M2, M3 and M4; (5) inputting the obtained feature maps M1, M2, M3 and M4 into a feature fusion module to obtain a feature map N8; and (6) inputting the obtained feature map N8 into a classification and positioning network to obtain target category and position information; the application effectively combines rich global context information extracted by the Transform network and edge detail information extracted by the CNN network, so that all targets in a picture can be positioned more accurately during final detection, and the map and Ap are further improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image target recognition technology in computer vision processing, and specifically to a target detection method based on feature grafting. Background Technology

[0002] Deep learning is a popular research area in computer vision. Object detection technology is a very important technique in computer vision, which aims to find all objects of interest in an image.

[0003] Object detection has wide applications and value in fields such as industrial inspection, medical diagnosis, and remote sensing analysis. There are two main technologies for existing object detection tasks. One is to extract features through a backbone network and then directly classify and locate the target based on the final extracted features. Although the extracted features have high semantic information, they lack more positional details, which can easily lead to inaccurate target location. The other is to extract features through a CNN backbone network or a Transform backbone network, and then fuse the features extracted from different layers of this backbone network. Finally, the fused features are used for classification and location. The extracted features either mainly contain local information and lack global information, or mainly contain global information and lack local details, which can also easily lead to inaccurate target classification and location. Summary of the Invention

[0004] Purpose of the invention: The purpose of this invention is to provide a target detection method based on feature grafting. This method utilizes the global information extraction capability of the Transform network and the local edge detail extraction capability of the CNN network, as well as a feature fusion module and a target classification and localization module, to solve the problem of insufficient location details of defects in the later features.

[0005] Technical solution: The target detection method based on feature grafting described in this invention includes the following steps:

[0006] (1) Use a camera to photograph the object to be inspected and construct a dataset;

[0007] (2) Construct a feature grafting and fusion network;

[0008] (3) Input the image of the object to be detected into the main network Transform and the sub-network CNN respectively to obtain two sets of feature maps;

[0009] (4) Input the obtained image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3 and M4;

[0010] (5) Input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8;

[0011] (6) Input the obtained feature map N8 into the classification and localization network to obtain the target category and location information.

[0012] Further, the specific steps of constructing the dataset in step (1) are as follows: the collected images are manually labeled using the software labelimg; for the target, a rectangular box label is used to mark its coordinate position, and a classification label is used to mark the target category. The labeled images are randomly divided into three groups according to the proportion: training set, validation set, and test set.

[0013] Furthermore, step (2) includes: a feature grafting module, a feature fusion module, and a classification and localization module. The feature grafting module includes: a main network (Transform) and a sub-network (CNN); the feature fusion module includes: convolutional layers and softmax layers.

[0014] Further, step (3) is as follows: the resolution of the object image to be detected is magnified by two times and input into the main network Transform and the sub-network CNN respectively to obtain feature maps T1, T2, T3, T4 and feature maps C1, C2, C3, C4.

[0015] Further, step (4) specifically involves: adding the extracted feature maps T1, T2, T3, T4 and C1, C2, C3, C4 at the element level to obtain feature maps F1, F2, F3, F4; and multiplying the obtained feature maps F1, F2, F3, F4 with the obtained feature maps C1, C2, C3, C4 at the element level to obtain feature maps M1, M2, M3, M4.

[0016] Furthermore, the implementation process of step (5) is as follows: the obtained feature maps M1, M2, M3, and M4 are respectively passed through a 1×1 convolution to obtain feature maps N1, N2, N3, and N4. Feature map N1 and feature map N2 are multiplied at the element level to obtain feature map N5. Feature map N5 is passed through a Softmax layer to obtain feature map N6. Feature map N6 and feature map N3 are multiplied at the element level to obtain feature map N7. Feature map N7 and feature map N4 are added at the element level to obtain feature map N8.

[0017] Furthermore, the implementation process of step (6) is as follows: the obtained feature map N8 is input into ROIPooling to generate candidate regions, and finally the generated feature map is input into a category classification fully connected layer and a location information fully connected layer to obtain category information and location information.

[0018] The target detection system based on feature grafting described in this invention includes:

[0019] Dataset building module: Used to capture images of the objects to be inspected using a camera and build a dataset;

[0020] Neural network building blocks: used for feature grafting and fusion networks;

[0021] Initial feature map module: used to input the image of the object to be detected into the main network Transform and the sub-network CNN respectively, to obtain two sets of feature maps;

[0022] Feature map grafting module: used to input the acquired image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3, and M4;

[0023] Feature map fusion module: Used to input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8;

[0024] Target recognition module: used to input the obtained feature map N8 into the classification and localization network to obtain target category and location information.

[0025] The device of the present invention includes a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that the processor executes the program to implement the steps of any of the feature-grafting-based target detection methods described in the present invention.

[0026] The present invention provides a storage medium storing a computer program, characterized in that the computer program is designed to implement the steps of any of the feature-grafting-based target detection methods described in the present invention when it is run.

[0027] Beneficial effects: Compared with the prior art, the present invention has the following significant advantages: It effectively combines the rich global context information extracted by the Transform network and the edge detail information extracted by the CNN network, so that the feature map contains both global context information and edge detail information. This makes it possible to more accurately locate all targets in the image during the final detection, and further improves both map and Ap. Attached Figure Description

[0028] Figure 1 This is a system schematic diagram of the present invention;

[0029] Figure 2 This is a schematic diagram of the grafting module of the present invention;

[0030] Figure 3 This is a schematic diagram of the fusion module of the present invention. Detailed Implementation

[0031] The technical solution of the present invention will be further described below with reference to the accompanying drawings.

[0032] like Figure 1-3 As shown, this embodiment of the invention provides a target detection method based on feature grafting, including the following steps:

[0033] (1) Use a camera to photograph the object to be inspected and construct a dataset; specifically: use the software labelimg to manually label the collected images; for the target, use a rectangular box label to mark its coordinate position and a classification label to mark the target category. Randomly divide the labeled images into three groups according to the proportion: training set, validation set, and test set.

[0034] (2) Construct a feature grafting and fusion network, including: a feature grafting module, a feature fusion module, and a classification and localization module. The feature grafting module includes: a main network Transform and a sub-network CNN; the feature fusion module includes: convolutional layers and softmax layers.

[0035] (3) Input the image of the object to be detected into the main network Transform and the sub-network CNN respectively to obtain two sets of feature maps; specifically as follows: enlarge the resolution of the image of the object to be detected by two times and input it into the main network Transform and the sub-network CNN respectively to obtain feature maps T1, T2, T3, T4 and feature maps C1, C2, C3, C4.

[0036] (4) Input the obtained image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3, and M4; specifically: add the extracted feature maps T1, T2, T3, T4 and C1, C2, C3, and C4 at the element level to obtain feature maps F1, F2, F3, and F4; multiply the obtained feature maps F1, F2, F3, and F4 with the obtained feature maps C1, C2, C3, and C4 at the element level to obtain feature maps M1, M2, M3, and M4.

[0037] (5) Input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8. The implementation process is as follows: The obtained feature maps M1, M2, M3, and M4 are respectively passed through a 1×1 convolution to obtain feature maps N1, N2, N3, and N4. Feature map N1 and feature map N2 are multiplied at the element level to obtain feature map N5. Feature map N5 is passed through a Softmax layer to obtain feature map N6. Feature map N6 and feature map N3 are multiplied at the element level to obtain feature map N7. Feature map N7 and feature map N4 are added at the element level to obtain feature map N8.

[0038] (6) Input the obtained feature map N8 into the classification and localization network to obtain the target category and location information. The implementation process is as follows: Input the obtained feature map N8 into ROIPooling to generate candidate regions. Finally, input the generated feature map into a category classification fully connected layer and a location information fully connected layer to obtain category information and location information.

[0039] This invention also provides a target detection system based on feature grafting, comprising:

[0040] Dataset building module: Used to capture images of the objects to be inspected using a camera and build a dataset;

[0041] Neural network building blocks: used for feature grafting and fusion networks;

[0042] Initial feature map module: used to input the image of the object to be detected into the main network Transform and the sub-network CNN respectively, to obtain two sets of feature maps;

[0043] Feature map grafting module: used to input the acquired image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3, and M4;

[0044] Feature map fusion module: Used to input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8;

[0045] Target recognition module: used to input the obtained feature map N8 into the classification and localization network to obtain target category and location information.

[0046] This invention also provides a device, including a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that the processor executes the program to implement the steps of any of the feature-grafting-based target detection methods described in the invention.

[0047] This invention also provides a storage medium storing a computer program, characterized in that the computer program is designed to implement the steps of any of the feature-grafting-based target detection methods described in the present invention when it is run.

Claims

1. A target detection method based on feature grafting, characterized in that, Includes the following steps: (1) Use a camera to photograph the object to be inspected and construct a dataset; (2) Construct a feature grafting and fusion network; The grafted neural network includes: a feature grafting module, a feature fusion module, and a classification and localization module; wherein, the feature grafting module includes: a main network Transform and a sub-network CNN; the feature fusion module includes: convolutional layers and softmax layers; (3) Input the image of the object to be detected into the main network Transform and the sub-network CNN respectively to obtain two sets of feature maps; specifically as follows: enlarge the resolution of the image of the object to be detected by two times and input it into the main network Transform and the sub-network CNN respectively to obtain feature maps T1, T2, T3, T4 and feature maps C1, C2, C3, C4; (4) Input the obtained image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3, and M4; specifically: add the extracted feature maps T1, T2, T3, T4 and C1, C2, C3, and C4 at the element level to obtain feature maps F1, F2, F3, and F4; multiply the obtained feature maps F1, F2, F3, and F4 with the obtained feature maps C1, C2, C3, and C4 at the element level to obtain feature maps M1, M2, M3, and M4; (5) Input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8; The implementation process is as follows: The obtained feature maps M1, M2, M3, and M4 are respectively passed through a 1×1 convolution to obtain feature maps N1, N2, N3, and N4. Feature map N1 and feature map N2 are multiplied at the element level to obtain feature map N5. Feature map N5 is passed through a Softmax layer to obtain feature map N6. Feature map N6 and feature map N3 are multiplied at the element level to obtain feature map N7. Feature map N7 and feature map N4 are added at the element level to obtain feature map N8; (6) Input the obtained feature map N8 into the classification and localization network to obtain the target category and location information; the implementation process is as follows: input the obtained feature map N8 into ROI Pooling to generate candidate regions, and finally input the generated feature map into a category classification fully connected layer and a location information fully connected layer to obtain category information and location information.

2. The target detection method based on feature grafting according to claim 1, characterized in that, Step (1) Constructing the dataset is as follows: Use the software labelimg to manually label the collected images; for the target, use a rectangular box label to mark its coordinate position and a classification label to mark the target category. Randomly divide the labeled images into three groups according to the proportion: training set, validation set, and test set.

3. A target detection system based on feature grafting, characterized in that, The method described in any one of claims 1-2 is used to implement the method, comprising: Dataset building module: Used to capture images of the objects to be inspected using a camera and build a dataset; Neural network building blocks: used to build feature grafting and fusion networks; Initial feature map module: used to input the image of the object to be detected into the main network Transform and the sub-network CNN respectively, to obtain two sets of feature maps; Feature map grafting module: used to input the acquired image of the object to be detected into the feature grafting module to obtain feature maps M1, M2, M3, and M4; Feature map fusion module: Used to input the obtained feature maps M1, M2, M3, and M4 into the feature fusion module to obtain feature map N8; Target recognition module: used to input the obtained feature map N8 into the classification and localization network to obtain target category and location information.

4. An apparatus comprising a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the target detection method based on feature grafting as described in any one of claims 1-2.

5. A storage medium storing a computer program, characterized in that, The computer program is designed to implement the steps of the feature grafting-based target detection method according to any one of claims 1-2 at runtime.