OCR technology-based different types of printed document transcription methods

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A printing and document technology, applied in the field of image recognition, can solve problems such as analysis failure and inability to analyze charts, etc., to achieve the effect of improving accuracy, improving the convenience of transcription, and improving the efficiency of transcription

Inactive Publication Date: 2019-09-20

SICHUAN XW BANK CO LTD

View PDF4 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] (3) When encountering a picture format, there is no way to parse the chart

[0008] (4) Parsing fails when the format of the toolkit and the document being transcribed is incompatible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] Such as figure 1 Shown the present invention is based on OCR technology different types of printed document transcription methods, including:

[0026] A. Use common image conversion tools to convert different types of printed documents into images in the same format, such as png, jpg, etc. The pictures converted on each page of multi-page printed documents are named according to a unified format, such as "original document name" + "current page number", which can intuitively, simply and conveniently identify the ownership of the picture pages and avoid It eliminates the confusion of managing multi-page printed documents.

[0027] B. Line projection is performed on the picture, and text lines in the picture are segmented and preprocessed by OCR (Optical Character Recognition) technology. At present, the common OCR (optical character recognition) technologies mainly include: text classification based on supervised learning, cnn (convolutional neural network) and CRNN+CT...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an OCR technology-based different types of printed document transcription method. The method comprises the following steps of: A, converting different types of printed documents into pictures with the same format through a picture conversion tool; B, carrying out line projection on the picture, and segmenting and preprocessing text lines in the picture through an OCR technology; C, performing character recognition on all the text lines segmented in the step B through an OCR technology; and D, combining the recognized characters into a complete document according to the segmentation sequence of the text lines. According to the OCR technology-based different types of printed document transcription method, character transcription can be conducted on various different types of printed documents in a unified mode, multiple transcription toolkits are not needed, the transcription efficiency and the transcription convenience degree are greatly improved, and meanwhile the transcription accuracy is also obviously improved.

Description

technical field [0001] The invention relates to a method for image recognition, in particular to a method for transcribing different types of printed documents based on OCR technology. Background technique [0002] In practical applications, it is often necessary to transcribe the text in pictures, pdf files, word\wps\xml, etc. into character strings and save them. If you rely on manual entry, it takes a lot of time and effort, and with the As the fatigue deepens, the error rate will also increase. Therefore, the automatic transcription system came into being. [0003] Currently, there are specific toolkits for transcribing texts for different data sources, such as transcribing through third-party tool analysis. The toolkits include tabula, pdfminer, pdf2htmlEX, python-docx, and xlrd. A common feature of these toolkits is that there is a one-to-one correspondence between data source file types and toolkits. A toolkit can only handle one type of document. If there are othe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/34

CPCG06V30/40G06V30/153G06V30/10

Inventor 吴信朝李开宇翟恩荣

Owner SICHUAN XW BANK CO LTD

OCR technology-based different types of printed document transcription methods

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology