A method for recognizing similar characters based on ocr fonts

A text recognition and character recognition technology, applied in the computer field, can solve the problems of reduced recognition accuracy, low recognition efficiency, and inconsistent recognition results, and achieve the effect of improving recognition accuracy, preserving the diversity of character features, and improving recognition efficiency.

Active Publication Date: 2022-05-03
中电万维信息技术有限责任公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This technology has a relatively good recognition rate for general characters, but there are still certain technical difficulties in the field of Chinese characters with rich structures and fonts, especially for similar fonts, such as: (午, gan, gan), (run, bubble, cannon) ) and other characters have the problems of low recognition efficiency and low accuracy
In addition, the existing technology cannot judge characters with the same glyph and different fonts. It is very easy to make mistakes when recognizing the same glyph and different fonts. The results of repeated recognition are different. Sometimes manual intervention is required to correct errors, which greatly reduces the accuracy of recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for recognizing similar characters based on ocr fonts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] A method based on OCR glyph-like text recognition, including the following steps:

[0026] A. Raw OCR image pre-processing

[0027] Text correction for oblique characters, noise removal in pictures, image contrast and gamma correction into grayscale images;

[0028] B. Image text detection

[0029] The preprocessed grayscale image is extracted for character pixel feature information, and the CNN neural network is used to extract character pixel feature information and convert it into a feature vector in the form of one-hot encoding, which is used as the basis for character pixel feature information recognition in the character recognition module;

[0030] C. Recognition calculation

[0031] Use different fonts from the Standard Font Library as the training sample n, and each different font from the Standard Font Library is denoted as n 1 、n 2 ..., calculate the Euclidean distance D for each font in the training sample n1、 D n2 ...... , The character recognition module adopt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to the field of computer technology, in particular to the field of pattern recognition and deep learning, and more specifically to a method for recognizing similar characters based on OCR fonts. Changing the traditional glyph recognition method, it can recognize both character text and fonts. Through multi-sample comparison and adding threshold screening, it not only greatly improves the accuracy of text recognition, but also effectively recognizes character fonts. It is especially suitable for character recognition of similar glyphs and similar fonts, and realizes double accurate recognition of glyphs and fonts. Each character is cut into a size of 96*96 pixels by horizontal and vertical segmentation, which facilitates the extraction of pixel feature information, avoids mutual interference between adjacent characters, and effectively improves recognition efficiency. The designers of the present invention combine books, newspapers, clothes and Each character in various pictures such as screenshots is cut into 96*96 pixels to extract character pixel feature information, and the extraction rate is close to 100%.

Description

Technical field [0001] The present invention relates to the field of computer technology, in particular to the field of pattern recognition and deep learning, and more particularly to a method based on OCR glyph-like text recognition. Background [0002] Optical Character Recognition (OCR) is a combination of optical technology and computer technology to convert image files printed on paper into text files, OCR recognition can be used for automatic scanning and long-term storage of bank bills, a large number of documents, archival files, tax bills and other bills. [0003] OCR recognition is usually measured by recognition rate, recognition speed, layout understanding and layout refactoring. This technology has a relatively good recognition rate for general characters, but there are still certain technical problems in the field of Chinese characters with rich structure and glyphs, especially for characters with similar glyphs, such as: (noon, dry, dry), (run, bubble, cannon) and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06V30/14G06V30/148G06V30/19G06V10/74G06K9/62
CPCG06V10/22G06V10/267G06F18/22
Inventor 席敬焦勇伏虎
Owner 中电万维信息技术有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products