OCR technology-based different types of printed document transcription methods

A printing and document technology, applied in the field of image recognition, can solve problems such as analysis failure and inability to analyze charts, etc., to achieve the effect of improving accuracy, improving the convenience of transcription, and improving the efficiency of transcription

Inactive Publication Date: 2019-09-20
SICHUAN XW BANK CO LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (3) When encountering a picture format, there is no way to parse the chart
[0008] (4) Parsing fails when the format of the toolkit and the document being transcribed is incompatible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OCR technology-based different types of printed document transcription methods

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Such as figure 1 Shown the present invention is based on OCR technology different types of printed document transcription methods, including:

[0026] A. Use common image conversion tools to convert different types of printed documents into images in the same format, such as png, jpg, etc. The pictures converted on each page of multi-page printed documents are named according to a unified format, such as "original document name" + "current page number", which can intuitively, simply and conveniently identify the ownership of the picture pages and avoid It eliminates the confusion of managing multi-page printed documents.

[0027] B. Line projection is performed on the picture, and text lines in the picture are segmented and preprocessed by OCR (Optical Character Recognition) technology. At present, the common OCR (optical character recognition) technologies mainly include: text classification based on supervised learning, cnn (convolutional neural network) and CRNN+CT...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an OCR technology-based different types of printed document transcription method. The method comprises the following steps of: A, converting different types of printed documents into pictures with the same format through a picture conversion tool; B, carrying out line projection on the picture, and segmenting and preprocessing text lines in the picture through an OCR technology; C, performing character recognition on all the text lines segmented in the step B through an OCR technology; and D, combining the recognized characters into a complete document according to the segmentation sequence of the text lines. According to the OCR technology-based different types of printed document transcription method, character transcription can be conducted on various different types of printed documents in a unified mode, multiple transcription toolkits are not needed, the transcription efficiency and the transcription convenience degree are greatly improved, and meanwhile the transcription accuracy is also obviously improved.

Description

technical field [0001] The invention relates to a method for image recognition, in particular to a method for transcribing different types of printed documents based on OCR technology. Background technique [0002] In practical applications, it is often necessary to transcribe the text in pictures, pdf files, word\wps\xml, etc. into character strings and save them. If you rely on manual entry, it takes a lot of time and effort, and with the As the fatigue deepens, the error rate will also increase. Therefore, the automatic transcription system came into being. [0003] Currently, there are specific toolkits for transcribing texts for different data sources, such as transcribing through third-party tool analysis. The toolkits include tabula, pdfminer, pdf2htmlEX, python-docx, and xlrd. A common feature of these toolkits is that there is a one-to-one correspondence between data source file types and toolkits. A toolkit can only handle one type of document. If there are othe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/34
CPCG06V30/40G06V30/153G06V30/10
Inventor 吴信朝李开宇翟恩荣
Owner SICHUAN XW BANK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products