Text detection method and system based on feature pyramid and attention fusion
A feature pyramid and text detection technology, which is applied in the field of text detection, can solve problems such as large time costs, and achieve the effects of improving accuracy, enhancing expressive ability, and improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] Such as figure 1 As shown, this embodiment provides a text detection method based on feature pyramid and attention fusion, using ResNet50 as the backbone network, and introducing a position attention network, wherein the position attention network introduces a self-attention mechanism to capture any two of the feature maps. The spatial dependence between each location, in order to improve the accuracy of the curved text, the specific steps are as follows:
[0049] Step 1: Obtain the image to be detected.
[0050] Step 2: Input the image to be detected into the text detection model to obtain the text position in the image.
[0051] In step 2, the text detection model needs to be trained through the training set.
[0052] As an implementation manner, a data set with text position calibration is obtained, and the data set is divided into a training set and a test set.
[0053] As an implementation, the Total-Text dataset is used, which is a word-level English curve text...
Embodiment approach
[0060] As an implementation, the backbone network includes a five-layer convolutional network. The backbone network ResNet50 is the first layer of convolutional network conv1, the second layer of convolutional network conv2_x, the third layer of convolutional network conv3_x, the fourth layer of convolutional network conv4_x, and the fifth layer of convolutional network conv5_x from bottom to top. The size of the first convolutional layer conv1 is 7*7*64, the size of the second layer of convolutional network conv2_x to the fifth layer of convolutional network conv5_x is 288*512*256, 144*256*512, 72*128* 1024, 36*64*2048.
[0061] The first layer of convolutional network performs convolution processing on the image and then inputs it into the second layer of convolutional network to obtain the first output feature; after the second layer of convolutional network pools the first output feature, it sequentially inputs a double convolution channel and two single convolution chann...
Embodiment 2
[0089] Such as figure 2 As shown, the present embodiment provides a text detection system based on feature pyramid and attention fusion, which specifically includes the following modules:
[0090] An image acquisition module configured to: acquire an image to be detected;
[0091] A text detection module configured to: input the image to be detected into the text detection model to obtain the text position in the image;
[0092] Wherein, the text detection model includes a feature extraction network and a feature fusion network; the backbone network of the feature extraction network is a convolutional network of different structures connected in sequence by multiple layers, and the output of the second layer convolutional network introduces positional attention network; the feature fusion network is used to fuse the output features of the convolutional network and the positional attention network to obtain the final features.
[0093] It should be noted here that each modul...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap