Text detection method and system based on feature pyramid and attention fusion

A feature pyramid and text detection technology, which is applied in the field of text detection, can solve problems such as large time costs, and achieve the effects of improving accuracy, enhancing expressive ability, and improving accuracy

Pending Publication Date: 2022-01-07
SHANDONG NORMAL UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most segmentation-based methods require complex post-processing to group pixel-level predictions into detected text instances, resulting in considerable time cost during inference

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text detection method and system based on feature pyramid and attention fusion
  • Text detection method and system based on feature pyramid and attention fusion
  • Text detection method and system based on feature pyramid and attention fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Such as figure 1 As shown, this embodiment provides a text detection method based on feature pyramid and attention fusion, using ResNet50 as the backbone network, and introducing a position attention network, wherein the position attention network introduces a self-attention mechanism to capture any two of the feature maps. The spatial dependence between each location, in order to improve the accuracy of the curved text, the specific steps are as follows:

[0049] Step 1: Obtain the image to be detected.

[0050] Step 2: Input the image to be detected into the text detection model to obtain the text position in the image.

[0051] In step 2, the text detection model needs to be trained through the training set.

[0052] As an implementation manner, a data set with text position calibration is obtained, and the data set is divided into a training set and a test set.

[0053] As an implementation, the Total-Text dataset is used, which is a word-level English curve text...

Embodiment approach

[0060] As an implementation, the backbone network includes a five-layer convolutional network. The backbone network ResNet50 is the first layer of convolutional network conv1, the second layer of convolutional network conv2_x, the third layer of convolutional network conv3_x, the fourth layer of convolutional network conv4_x, and the fifth layer of convolutional network conv5_x from bottom to top. The size of the first convolutional layer conv1 is 7*7*64, the size of the second layer of convolutional network conv2_x to the fifth layer of convolutional network conv5_x is 288*512*256, 144*256*512, 72*128* 1024, 36*64*2048.

[0061] The first layer of convolutional network performs convolution processing on the image and then inputs it into the second layer of convolutional network to obtain the first output feature; after the second layer of convolutional network pools the first output feature, it sequentially inputs a double convolution channel and two single convolution chann...

Embodiment 2

[0089] Such as figure 2 As shown, the present embodiment provides a text detection system based on feature pyramid and attention fusion, which specifically includes the following modules:

[0090] An image acquisition module configured to: acquire an image to be detected;

[0091] A text detection module configured to: input the image to be detected into the text detection model to obtain the text position in the image;

[0092] Wherein, the text detection model includes a feature extraction network and a feature fusion network; the backbone network of the feature extraction network is a convolutional network of different structures connected in sequence by multiple layers, and the output of the second layer convolutional network introduces positional attention network; the feature fusion network is used to fuse the output features of the convolutional network and the positional attention network to obtain the final features.

[0093] It should be noted here that each modul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of text detection, and provides a text detection method and system based on feature pyramid and attention fusion. The method comprises the following steps: firstly obtaining a to-be-detected image; and inputting a to-be-detected image into the text detection model to obtain a text position in the image. The text detection model comprises a feature extraction network and a feature fusion network; a backbone network of the feature extraction network is a plurality of layers of convolutional networks which are connected in sequence and have different structures, and a position attention network is introduced to the output of the second layer of convolutional network; the feature fusion network is used for fusing output features of the convolutional network and the position attention network to obtain final features; and the representation capability of the local features is enhanced, so that the accuracy of detecting the curved text is improved.

Description

technical field [0001] The invention belongs to the technical field of text detection, and in particular relates to a text detection method and system based on feature pyramid and attention fusion. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] Scene text detection has received increasing attention in recent years, and has attracted increasing attention from computer vision researchers due to its wide range of applications, such as image and video retrieval, autonomous driving, and scene text translation. [0004] Scene text detection, as a key component of scene text reading, aims to detect text regions in complex backgrounds and annotate them with bounding boxes. Despite remarkable achievements in object detection, accurately detecting scene text is still challenging because scene text usually has various scales and shapes, including h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06V20/62G06V30/19G06V10/80G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/253
Inventor 万洪林王嘉鑫赵莹莹王晓敏
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products