Unlock instant, AI-driven research and patent intelligence for your innovation.

A CNN-based handwritten English document recognition method

A recognition method and English technology, applied in the field of computer vision, to achieve the effect of improving recognition accuracy, saving training time, and rapid network structure convergence

Pending Publication Date: 2019-05-24
TIANJIN UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 1) The selection and collection of data sets;
[0008] 2) The problem of character segmentation, how to quickly and accurately segment the characters in the document is the main aspect of the image segmentation problem;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A CNN-based handwritten English document recognition method
  • A CNN-based handwritten English document recognition method
  • A CNN-based handwritten English document recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] A CNN-based handwritten English document recognition method, see figure 1 , the method includes the following steps:

[0041] 101: Obtain a data set composed of handwritten English letters and punctuation marks, and construct a training sample set and a test sample set based on the data set;

[0042] 102: Construct an 8-layer convolutional neural network, including 5 convolutional layers and 3 fully connected layers, and the output of the last fully connected layer is sent to a softmax layer with 59 output vectors;

[0043] 103: Using overlapping Pooling, a Pooling layer is composed of Pooling unit grids with an interval of s pixels, each grid has a z*z size proximity relationship, all located in the center of the Pooling unit, s<z; yes Convolution, downsampling, and pooling operations are performed on each pixel of the input image to obtain the size of the feature map of each layer;

[0044] 104: Input the training sample set, extract character features, and perform ...

Embodiment 2

[0052] The following is combined with specific examples, calculation formulas, Figure 1-Figure 5 The scheme in Example 1 is further introduced, see the following description for details:

[0053] 201: Obtain a sample set of English letters and punctuation marks;

[0054] First, orthorectify the handwritten English document image; then use the projection method [1] Carry out character segmentation on the measured English handwritten document: first perform the horizontal projection of the image to obtain the projection histogram in the horizontal direction. According to the projection histogram, divide each line of the image in the handwritten English document; The text image cut out by the line is vertically projected to obtain the projection histogram in the vertical direction, and a single character image is segmented according to the principle of dichotomy [2] .

[0055] Among them, the handwritten English letters and related punctuation texts of 30 people were collecte...

Embodiment 3

[0086] Combine below Figure 4-Figure 5 , and Table 1 and Table 2, the scheme in Embodiment 1 and 2 is further introduced, see the following description for details:

[0087] The data set includes: 52 English letters in upper and lower case, 6 commonly used punctuation marks (",.?!:), and space bar, basically covering all the characters that may appear in the recognition of handwritten English documents, a total of 59 characters. Each person writes each character 5 times, 4 of which are selected as the training sample set, and the remaining 1 is used as the test sample set I, and finally 6790 training sample sets and 1180 test sample sets are obtained.

[0088] The data set does not need to go through complicated preprocessing steps, and steps such as binarization, noise removal, and tilt correction are omitted. The image size and type are unified to 320*320*3unit8, which helps to improve the stability of training samples. The constructed neural network uses the VGG model to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a CNN-based handwritten English document recognition method, which comprises the following steps of: obtaining a data set consisting of handwritten English letters and punctuation mark texts, and constructing a training sample set and a test sample set on the basis of the data set; constructing eight convolutional neural networks including five convolutional layers and three full connection layers, and sending the output of the last full connection layer to a softmax layer with 59 output vectors; carrying out convolution, downsampling and pooling operation on each pixelof the input image by adopting overlapped Pooling to obtain the size of the feature map of each layer; inputting a training sample set, extracting character features, and carrying out classificationtraining; Uniformly zooming the extracted single character image to 320*320 pixels; Adding a color channel, and converting the type of the character image into 320*320*3uint8 type data to obtain a test sample set; and automatically identifying English symbols by using the trained neural network.

Description

technical field [0001] The invention relates to the fields of computer vision and pattern recognition, in particular to an algorithm for handwritten character recognition, which can recognize handwritten numbers and handwritten English characters, and can be used for recognition of other handwritten characters after expansion. Background technique [0002] Handwritten character recognition has always been a popular research topic. Character recognition needs to solve many problems such as data collection, processing and selection, selection of input sample expression, selection of pattern recognition classifier, and guided training of the recognizer based on the sample set. question. [0003] At present, experts and scholars from various countries have proposed many ways to obtain the main features of handwritten characters, which are mainly divided into two types: structural analysis and global holistic analysis. The main features of characters are concentrated in points, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/20G06K9/34G06N3/04G06N3/08
Inventor 何凯马红悦冯旭高圣楠
Owner TIANJIN UNIV