OCR-based Character segmentation method

A character segmentation and character technology, applied in the field of optical character recognition, can solve the problems of text deformation, spacing change, size change, recognition rate reduction, etc., to achieve the effect of improving accuracy, ensuring matching degree, and stabilizing recognition rate

Active Publication Date: 2021-11-19
SUZHOU DINNAR TECH FOR AUTOMATION CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when actually printing text information, due to different printing environments (moving printing, printing, printing inconsistencies with different devices), there will be some situations such as text deformation, spacing changes, size changes, etc., when according to the traditional OCR character library After the model is trained for standard characters, it is easy for the model to merge two characters into one or cut one character into two due to the above situation, which will lead to a decrease in the recognition rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OCR-based Character segmentation method
  • OCR-based Character segmentation method
  • OCR-based Character segmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to describe the technical solution of the above invention in more detail, specific examples are listed below to demonstrate the technical effect; it should be emphasized that these examples are used to illustrate the present invention and not limit the scope of the present invention.

[0031] OCR-based character segmentation method provided by the present invention, such as figure 1 shown, including the following steps:

[0032] Step 1, data collection: Obtain a template font library based on OCR technology, the template font library includes standard characters and feature data of the standard characters, the feature data includes at least the grayscale, size, aspect ratio, and area of ​​the standard characters Center of gravity, area and spacing, specifically, the method for obtaining template fonts based on OCR technology may include: collecting pictures of the standard characters, using the OCR technology to segment and obtain the template fonts, for exampl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an OCR-based character segmentation method, which comprises the following steps: 1, acquiring a template character library based on an OCR technology, the template character library comprising standard characters and characteristic data of the standard characters; 2, identifying a part of characters in the same batch as the to-be-identified characters by using a character identification model in the OCR technology to obtain a character segmentation result, manually marking error items in the segmentation result, and updating the character identification model; 3, performing line scanning on a character to be recognized, performing initial recognition on the character to be recognized based on the updated character recognition model, and when the score of the recognition result of a certain character is smaller than a first threshold value, performing forced segmentation on the character; step 4, carrying out normalization processing; and 5, matching a certain character with a standard character according to the normalized feature data, calculating to obtain the standard character with the highest score, and determining the segmentation position of the current character based on the standard character with the highest score. The accuracy of character segmentation can be improved.

Description

technical field [0001] The invention relates to the field of optical character recognition, in particular to an OCR-based character segmentation method. Background technique [0002] In fields related to optical character recognition, such as printed text and laser marking, OCR (Optical Character Recognition, optical character recognition) plays an important role. At present, almost every product has similar information such as the production batch number. In order to ensure the traceability of the product, OCR technology is usually required. However, when actually printing text information, due to different printing environments (moving printing, printing, printing inconsistencies with different devices), there will be some situations such as text deformation, spacing changes, size changes, etc., when according to the traditional OCR character library After the model is trained for standard characters, it is easy for the model to merge two characters into one or cut one ch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/32G06K9/34G06K9/62
CPCG06F18/22G06F18/214
Inventor 秦应化李安吴昆
Owner SUZHOU DINNAR TECH FOR AUTOMATION CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products