OCR image character recognition and character correction method and system

A character recognition and image recognition technology, applied in the field of Chinese character recognition, can solve the problems of deep neural network misrecognition, unsatisfactory test results, and vulnerability to attacks, etc., to achieve the effect of improving accuracy, improving recognition accuracy, and strong correction ability

Active Publication Date: 2020-08-14
梁华智能科技(上海)有限公司
View PDF9 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

According to relevant market surveys in 2018, many traditional OCR manufacturers in the market did not perform well in the testing of various bills photographed by mobile phones. The field has achieved good results, but due to the large base of Chinese characters, the required training data set is thousands of times more than that of Western character sets (conservative estimates), so the performance of Chinese character OCR on the open AI platform is quite poor on poor images. Ideal, and end-to-end deep neural networks have natural misidentifications and are vulnerable to attacks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OCR image character recognition and character correction method and system
  • OCR image character recognition and character correction method and system
  • OCR image character recognition and character correction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] like figure 1 As shown, a method for OCR image character recognition and character correction, including

[0059] Perform character recognition on the image to be recognized by training the network to obtain character recognition information;

[0060] The preset correction rules are used to check the character recognition information to obtain a character correction result;

[0061] Wherein said character recognition is carried out by training the network to be recognized image comprises:

[0062] By constructing and fitting the Pr function, taking the four features of horizontal and vertical as variables, constructing a training network to calculate the degree of deformation of Chinese characters;

[0063] A second-level similar character distinguishing network is added to accurately distinguish similar characters for the best recognition result determined by the training network for the first time.

[0064] In the method of the present invention, character recognit...

Embodiment 2

[0104] like figure 2 As shown, the present invention also provides an OCR recognition system, including a character recognition module and a character correction module; wherein

[0105] The character recognition module is used to perform character recognition on the image to be recognized by training the network to obtain character recognition information; wherein the character recognition on the image to be recognized by the training network includes:

[0106] By constructing and fitting the Pr function, taking the four features of horizontal and vertical as variables, constructing a training network to calculate the degree of deformation of Chinese characters;

[0107] And add a second-level similar word distinguishing network to accurately distinguish similar words for the best recognition result determined by the training network for the first time;

[0108] The character correction module is used to check the character recognition information according to preset correc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an OCR image character recognition and character correction method. In the character recognition module, a Chinese character deformation degree Pr function is constructed and fitted by adopting a multi-stage neural network, and the network reflects the deformation degree Pr of a target Chinese character by taking image CNN data and four additional features of horizontal, vertical, left-falling and right-falling as variables and taking GAN recognition degrees of different degrees as training values of the deformation degree; in the character correction module, a second-stage similar character distinguishing network is additionally arranged and used for carrying out high-precision similar character distinguishing on the optimal recognition result determined by the training network for the first time, and due to the arrangement of a second-stage network, the complexity of the first-stage network can be reduced, and the overall generalization ability of the networkcan be improved. The method and the system are mainly used for identifying machine-made invoices, various forms and documents, are high in recognition precision, high in recognition speed and strong in adaptability, and have strong correction capability for partial information loss and recognition errors; compared with the recognition effect of the traditional OCR recognition technology, the recognition accuracy is greatly improved.

Description

technical field [0001] The invention relates to the technical field of Chinese character recognition, in particular to a method and system for OCR image character recognition and character correction. Background technique [0002] OCR (Optical Character Recognition, Optical Character Recognition) technology is to convert the text of various bills, newspapers, books, manuscripts and other printed materials into image information through optical input methods such as scanning, and then use text recognition technology to convert the image information into usable computer input technology. [0003] With the continuous development of image sensors, especially the exponential increase in the number of various mobile phones and professional (such as security) cameras, computer image data is increasing rapidly; but the image quality is relatively lower than that of traditional scanners or various professional cameras; traditional Chinese character OCR technology faces the problem t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/20G06K9/62G06N3/04
CPCG06V10/22G06V10/751G06V30/10G06N3/045G06F18/214
Inventor 宋国梁颜长华
Owner 梁华智能科技(上海)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products