Optical character sequence recognition method

A text sequence and recognition method technology, applied in the field of image text recognition, can solve the problems of limited optimization and supplementation, loss of available information in pictures, difficulty in segmentation, etc., and achieve the effect of avoiding linear growth.

Inactive Publication Date: 2016-06-08
成都数联铭品科技有限公司
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, considering the quality of the scanning effect, the quality of the paper document itself (such as printing quality, font clarity, font standardization, etc.), the content layout (the arrangement of the text, compared with ordinary text and form text and bills) differences , the actual effect of OCR is not always satisfactory
The requirements for recognition accuracy of different paper documents are different. For example, the recognition of bills has very high requirements for accuracy, because if a digital recognition error may lead to fatal consequences, traditional OCR recognition cannot meet the requirements. Such high-precision identification requirements
[0003] Conventional OCR methods include image segmentation, feature extraction, single character recognition and other processing processes, wherein image segmentation includes a large number of image preprocessing processes, such as tilt correction, background denoising, and single character extraction; these processes The process is not only cumbersome and time-consuming, but also may cause the image to lose a lot of available information; and when the image to be recognized contains a string of multiple characters, the traditional OCR method needs to divide the original string into several small images containing a single character for separation. Recognition, there are two main problems in this method: 1. It is difficult to segment single-character pictures, especially when the left and right radicals are mixed, letters, numbers, symbols, or background noise, character distortion, bonding, etc., the segmentation more difficult
Once there is a problem with segmentation, it is difficult to obtain accurate recognition results.
2. The separate recognition method that divides the string into sub-pictures containing a single character for recognition does not make full use of the dependencies between words and words in natural language. Although additional language models can be used to optimize and supplement the recognition results, but consider The construction process of the language model and the recognizer is independent of each other, and the optimization supplement in this way is locally limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optical character sequence recognition method
  • Optical character sequence recognition method
  • Optical character sequence recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The present invention will be further described in detail below in conjunction with test examples and specific embodiments. However, it should not be understood that the scope of the above subject matter of the present invention is limited to the following embodiments, and all technologies realized based on the content of the present invention belong to the scope of the present invention.

[0043] The invention provides an optical character sequence recognition method. The present invention has applied the technology of convolutional neural network (CNN) and recursive neural network (RNN), carries out feature extraction to the whole picture that contains a plurality of characters through CNN, then sends the same feature into RNN and recursively reuses, to realize Continuously predict the purpose of multiple characters. The optical character sequence recognition realized by the method of the present invention systematically overcomes the disadvantage of image segmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the image character recognition field and relates to an optical character sequence recognition method. According to the method of the invention, CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) technologies are adopted; feature extraction is performed on a whole picture containing a plurality of characters through a CNN; identical features are transmitted to an RNN so as to be subjected to repeatedly recursive use; and continuous prediction of the plurality of characters can be realized. With the method adopted, a defect that picture segmentation is required before OCR (optical character recognition) can be eliminated, the early-stage processing process of picture character recognition can be simplified, and the efficiency of character recognition can be significantly improved; and since the RNN recursively uses output data of the last round, and in model training, a language model of dependency relationships between characters and words can be obtained through learning, and therefore, a step in an OCR method, according to which a language model is required to be additionally built for post-processing after individual characters are recognized, can be avoided; and therefore, the recognition accuracy of character and word sequences can be better improved, and the processing efficiency of character recognition can be further improved.

Description

technical field [0001] The invention relates to the field of image character recognition, in particular to an optical character sequence recognition method. Background technique [0002] With the development of society, there is a large demand for the digitization of ancient books, documents, bills, business cards and other paper media. The digitization here is not limited to "photographic" using scanners or cameras, but more importantly, the Files are converted into readable and editable documents for storage. To achieve this process, image text recognition is required for scanned pictures, while traditional image text recognition is optical text recognition (OCR), and optical text recognition will be developed in the future. Recognition is based on the scanning of paper documents into electronic images. However, considering the quality of the scanning effect, the quality of the paper document itself (such as printing quality, font clarity, font standardization, etc.), the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/08
CPCG06N3/088G06F18/2111G06F18/214
Inventor 刘世林何宏靖陈炳章吴雨浓姚佳
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products