Image character sequence recognition system based on recurrent neural network

A recurrent neural network and text sequence technology, applied in the field of image text recognition, can solve the problems of difficult segmentation of single-character pictures, difficulty in obtaining recognition results, and insufficient use of dependencies.

Inactive Publication Date: 2016-06-08
成都数联铭品科技有限公司
View PDF3 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, considering the quality of the scanning effect, the quality of the paper document itself (such as printing quality, font clarity, font standardization, etc.), the content layout (the arrangement of the text, compared with ordinary text and form text and bills) differences , the actual effect of OCR is not always satisfactory
The requirements for recognition accuracy of different paper documents are different. For example, the recognition of bills has very high requirements for accuracy, because if a digital recognition error may lead to fatal consequences, traditional OCR recognition cannot meet the requirements. Such high-precision identification requirements
[0003] Conventional OCR methods include image segmentation, feature extraction, single character recognition and other processing processes, wherein image segmentation includes a large number of image preprocessing processes, such as tilt correction, background denoising, and single character extraction; these processes The process is not only cumbersome and time-consuming, but also may cause the image to lose a lot of available information; and when the image to be recognized contains a string of multiple characters, the traditional OCR method needs to divide the original string into several small images containing a single character for separation. Recognition, there are two main problems in this method: 1. It is difficult to segment single-character pictures, especially Chinese characters, letters, numbers, and symbols with left and right radicals are mixed in the string, and the characters are inclined, twisted, glued, or the image has Segmentation is more difficult in situations such as background noise
Once there is a problem with segmentation, it is difficult to obtain accurate recognition results.
2. The separate recognition method that divides the string into sub-pictures containing a single character for recognition does not make full use of the dependencies between words and words in natural language. Although additional language models can be used to optimize and supplement the recognition results, but consider The construction process of the language model and the recognizer is independent of each other, and the optimization supplement in this way is locally limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image character sequence recognition system based on recurrent neural network
  • Image character sequence recognition system based on recurrent neural network
  • Image character sequence recognition system based on recurrent neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described in detail below in conjunction with test examples and specific embodiments. However, it should not be understood that the scope of the above subject matter of the present invention is limited to the following embodiments, and all technologies realized based on the content of the present invention belong to the scope of the present invention.

[0039] An image character sequence recognition system based on a recursive neural network is provided. Including convolutional neural network (CNN) and recursive neural network classifier classification (RNN), feature extraction is performed on the entire picture containing multiple characters through CNN, and then the same feature is sent to RNN for recursive reuse to achieve continuous prediction The purpose of multiple characters. The image text sequence recognition realized by the system of the present invention systematically overcomes the disadvantage of image segmentation bef...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of image character recognition, and particularly relates to an image character sequence recognition system based on a recurrent neural network; the system comprises an image character input module, a convolutional neural network and a recurrent neural network classifier; the convolutional neural network extracts characteristics of a to-be-recognized character sequence input by the image character input module, and inputs to the recurrent neural network classifier; and the recurrent neural network classifier, according to sample characteristic data and output of the last moment, realizes continuous recognition of the character sequence. According to the system disclosed by the invention, the shortage that picture segmentation is carried out before OCR recognition is overcome, the earlier stage processing of the image character recognition is simplified, and a language model does not need to be constructed additionally to carry out optimization processing on a recognition result; while the recognition accuracy rate of character and word sequences is improved better, the processing efficiency of the character recognition is obviously improved; and the system has wide application prospect in the field of image character recognition.

Description

technical field [0001] The invention relates to the field of image and character recognition, in particular to an image and character sequence recognition system based on a recursive neural network. Background technique [0002] With the development of society, there is a large demand for the digitization of ancient books, documents, bills, business cards and other paper media. The digitization here is not limited to "photographic" using scanners or cameras, but more importantly, the Files are converted into readable and editable documents for storage. To achieve this process, image text recognition is required for scanned pictures, while traditional image text recognition is optical text recognition (OCR), and optical text recognition will be developed in the future. Recognition is based on the scanning of paper documents into electronic images. However, considering the quality of the scanning effect, the quality of the paper document itself (such as printing quality, font...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/02
CPCG06N3/02G06V30/10G06F18/24
Inventor 刘世林何宏靖陈炳章吴雨浓姚佳
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products