Method for extracting and identifying graphic and text information of scanned document

An information extraction and document technology, applied in the field of image and text recognition, can solve the problem of lack of effective methods for key information extraction and identification of documents, and achieve the effect of fast and efficient detection and identification, reducing redundant information and reducing interference

Pending Publication Date: 2020-07-10
STATE GRID CORP OF CHINA +1
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Great progress has been made in the recognition of simple printed documents and text detection and recognition in complex scenes, but there is no effective method for the extraction and identification of key information in documents.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting and identifying graphic and text information of scanned document
  • Method for extracting and identifying graphic and text information of scanned document
  • Method for extracting and identifying graphic and text information of scanned document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to better explain the present invention and facilitate understanding, the present invention will be described in detail below through specific embodiments in conjunction with the accompanying drawings.

[0039] The present invention provides a method for extracting and identifying graphic and text information of scanned documents, the method comprising the following steps:

[0040] Step 1), denoising and smoothing the scanned document image, and performing layout segmentation on the preprocessed image, including but not limited to signatures and seals;

[0041] Step 2), preprocess the signature extracted in step 1), remove the background, input it into the Writer-Dependent network to extract its feature value, and then input it into the SVM classifier obtained by using the real signature training to obtain the true signature of the signature Falseness;

[0042] Step 3), transform the stamp extracted in step 1) into polar coordinates after preprocessing, so tha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for extracting and identifying graphic and text information of a scanned document, which comprises the following steps of: 1) preprocessing a scanned document image,carrying out layout segmentation on the preprocessed image, and selecting items, including but not limited to, a signature and a seal; (2) preprocessing the signature extracted in the step (1), removing a background by adopting an OTSU algorithm, inputting the signature into a Writer-Dependent network to extract a characteristic value of the signature, and inputting the characteristic value into an SVM classifier obtained by using real signature training to obtain the authenticity of the signature; (3) conducting polar coordinate transformation on the seal extracted in the step (1) after preprocessing, so that annular characters in the seal are expanded into transversely-arranged characters, inputting the expanded characters into a CPTN + CRNN network to be sequentially extracted and recognized, and outputting the character content of the seal; 4) judging the validity of the document. The method can replace manual work to analyze and judge the document.

Description

technical field [0001] The invention relates to the technical field of image-text recognition, and more specifically, relates to a method for extracting and identifying image-text information of scanned documents. Background technique [0002] OCR (optical character recognition, optical character recognition) can use optical equipment to capture images and recognize text, extend the ability of the human eye to the machine, and recognize the graphic information in the scanned document into editable text, which can replace manual input , improve business efficiency. In actual work, document processors often use key content such as the date, signature and seal of the document to identify the category and validity of the document. At present, the extraction and identification of document information are all done manually, which is inefficient and has misjudgments and omissions. Therefore, it is of great significance in practical work to use OCR and image processing related tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/34G06K9/32G06K9/40G06K9/44
CPCG06V30/153G06V10/30G06V10/34G06V10/25G06V10/267
Inventor 姚渭箐李新德戴俊峰张成黄杰郭峰张泉肖进胜熊闻心杨天
Owner STATE GRID CORP OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products