Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multilingual end-to-end OCR algorithm and system

A multi-language, algorithmic technology, applied in the field of OCR algorithm, can solve the problems of context information loss, too many parameters, loss, etc., to achieve good training results, simple training process, and reduce the amount of parameters.

Pending Publication Date: 2020-12-18
广州探迹科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. Usually the input text image is a feature sequence with spatial or plane information association. Using the existing overall recognition network, which needs to be reshaped by the Recurrent layer, the context information constructed by it will inevitably be lost, especially the spatial position. information
However, if the 2D-Recurrent layer is used for encoding to solve the problem of information loss, it will make too many parameters and complex structure
[0004] 2. Use the existing overall recognition network. Generally, LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) networks have requirements for the width of input slices, and in text information, there are often mixed Chinese, English, and Chinese numbers. Sticking, etc., resulting in different character widths, and the adaptive width fragmentation structure is difficult to have better robustness in these scenarios
[0005] 3. For the OCR pipeline of positioning and recognition classification, it is usually difficult to achieve the optimal anchor box size selection, which requires lengthy iteration and labeling process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multilingual end-to-end OCR algorithm and system
  • Multilingual end-to-end OCR algorithm and system
  • Multilingual end-to-end OCR algorithm and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] refer to figure 1 , a multilingual end-to-end OCR algorithm based on self-distillation transformer and label-refinement, including:

[0060] Obtain the feature map of the picture to be recognized;

[0061] The feature map is trained through the relationship attention module to obtain the character matrix; the relationship attention module is a relationship attention module based on the self-distillation transformer module;

[0062] Perform parallel attention decoding on the character matrix to obtain prediction results;

[0063] According to the prediction result, based on the preset vocabulary and sentence list, an OCR model matching the language of the vocabulary and sentence list is obtained.

[0064] The relationship attention module of the present disclosure is a structure improved with reference to the BERT framework in the NLP field; the structure is as follows figure 2 As shown, the relational attention module includes:

[0065] N bidirectional transformer la...

Embodiment 2

[0094] A multilingual end-to-end OCR system based on self-distillation transformer and label-refinement, including:

[0095] The feature extraction layer is used to obtain the feature map of the picture to be recognized;

[0096] The relationship attention module is used to train the feature map and obtain the character matrix; the relationship attention module is a relationship attention module based on the self-distillation transformer module;

[0097] The parallel attention decoding layer is used to perform parallel attention decoding on the character matrix to obtain prediction results.

[0098] As an optional solution of the above-mentioned embodiment, the relationship attention module includes:

[0099] N bidirectional transformer layers, each transformer layer includes M transformer nodes; N≥2, M≥2;

[0100] Any transofrmet node of the N-1th transofrmet layer is connected to all transofrmet nodes of the Nth transofrmet layer.

[0101] As an optional solution of the f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multilingual end-to-end OCR algorithm and a multilingual end-to-end OCR system, which overcome and bypass the defects of fragments in the prior art and have excellent performance on character adhesion, Chinese-English and median mixed data. By means of a self-distillation transformer module, the position relation is reserved, parameters and model complexity are reduced, meanwhile, results are output in parallel, dependence between nodes is cut off, higher robustness is achieved for multilingual and multi-font scenes, and the structure and performance are optimized. Thealgorithm comprises the steps of obtaining a feature map of a to-be-identified picture; training the feature map through a relation attention module based on the self-distillation transformer moduleto obtain a character matrix; performing parallel attention decoding on the character matrix to obtain a prediction result; and according to the prediction result, based on the vocabulary sentence table, obtaining an OCR model conforming to the language of the vocabulary sentence table.

Description

technical field [0001] The invention relates to an OCR algorithm, in particular to a multilingual end-to-end OCR algorithm and system. Background technique [0002] In the modern enterprise production environment and daily business activities, OCR—Optical Character Recognition: optical text recognition technology has been fully promoted and applied on a large scale thanks to the increasing demand for improving information entry efficiency and diversifying information carriers. In existing industrial application scenarios, considering compatibility and reliability, OCR applications are usually divided into two parts: text detection and text recognition. The text recognition network is also divided into a single word classification recognition network structure and an overall recognition network structure. The existing overall recognition network usually adopts CRNN (Convolutional Recurrent Neural Network) structure, which has the following disadvantages: [0003] 1. Usually...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/32G06K9/62G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06V20/62G06V30/10G06N3/045G06F18/2415G06F18/241
Inventor 陈开冉黎展孙建旸
Owner 广州探迹科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products