Scene text detection method based on end-to-end full convolutional neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network and text detection technology, applied in the field of scene text detection based on end-to-end full convolutional neural network, can solve the problem of inability to accurately express the geometric characteristics of text, and achieve the effect of good application value

Active Publication Date: 2018-07-17

ZHEJIANG UNIV

View PDF4 Cites 38 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Traditional methods generally use a fixed receptive field to extract the feature expression of the text and ignore the diversification of the target space structure of the text. Although these methods have certain innovations, they cannot accurately express the geometric characteristics of the text, which is very important in this task. important

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0062] The implementation method of this embodiment is as described above, and the specific steps will not be described in detail. The following only shows the effect of the case data. The present invention is implemented on two data sets with ground-truth labels, namely:

[0063] MSRA-TD500 dataset: This dataset contains 300 training images and 200 testing images.

[0064] ICDAR 2015 dataset: This dataset contains 1000 training images and 500 testing images.

[0065] In this embodiment, experiments are carried out on each data set, and the images in the data set are for example figure 2 shown.

[0066] The main process of text detection is as follows:

[0067] 1) Extract the multi-scale feature map of the image through the basic full convolutional network;

[0068] 2) Fusion of feature maps on three scales to obtain initial features;

[0069] 3) Use a layer of convolution operation to predict the affine transformation matrix of each sample point on the feature map, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses a scene text detection method based on an end-to-end full convolutional neural network, which is used for the problem of finding a multi-directional text position in animage of a natural scene. The method specifically comprises the following steps: obtaining a plurality of image data sets for training scene text detection, and defining an algorithm target; carryingout feature learning on the image by using a full convolution feature extraction network; predicting an affine transformation matrix in an instance level for each sample point on the feature map, andcarrying out feature expression on the text according to the predicted affine transformation deformation sampling grid; classifying feature vectors of a candidate text, and carrying out coordinate regression and affine transformation regression to jointly optimize the model; using the learning framework to detect the precise position of the text; and carrying out non-maximum suppression on the bounding box set output by the network to obtain a final text detection result. The method disclosed by the present invention is used for scene text detection of real image data, and has a better effectand robustness for multi-directional, multi-scale, multi-lingual, shape distortion and other complicated situations.

Description

technical field [0001] The invention belongs to the field of computer vision, and in particular relates to a scene text detection method based on an end-to-end full convolutional neural network. Background technique [0002] Scene text detection is defined as the problem of finding multi-directional, multi-scale, multi-lingual text region locations in natural scene images. In recent years, it has been widely used in computer vision tasks such as scene understanding and image retrieval. There are two key points in this task: the first is how to well model multi-directional and severely distorted text objects to generate effective feature expressions; the second is how to use an end-to-end network to directly output detection results. For the first point, the present invention believes that the key to feature expression of scene text is to accurately model its spatial geometric characteristics, and use affine transformation to encode its spatial structure to produce a more ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06F17/30

CPCG06F16/355G06N3/045

Inventor 李玺王芳芳赵黎明

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Scene text detection method based on end-to-end full convolutional neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology