Simple and effective incomplete table identification and cross-page splicing method

A form and incomplete technology, applied in the field of incomplete form recognition and cross-page splicing, can solve the problems of incomplete splicing of cross-page forms and low recognition accuracy of incomplete forms, so as to simplify the problem of inaccurate identification and segmentation, and improve the extraction accuracy. The effect is simple and effective

Pending Publication Date: 2020-09-04
XIAN TECHNOLOGICAL UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a simple and effective method for identifying incomplete forms and splicing across pages to s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Simple and effective incomplete table identification and cross-page splicing method
  • Simple and effective incomplete table identification and cross-page splicing method
  • Simple and effective incomplete table identification and cross-page splicing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The area of ​​interest is determined by Harris corner detection, and then separate segmentation processing is performed according to the particularity of the text area and the table area. The table area is divided into cells, and the text area is divided according to behavioral standards. The image data after segmentation is passed into the convolutional cyclic neural network (CRNN) for OCR text recognition, and then subsequent operations such as database storage.

[0029] see figure 1 and figure 2 The present invention provides a simple and effective method for identifying incomplete forms and splicing across pages, comprising the following steps:

[0030] Step 1, image preprocessing stage:

[0031] Step 101, multi-resolution image compression grayscale, including the following steps:

[0032] (1) Input image color image

[0033] (2) Convert to grayscale image

[0034] Step 102, OTSU binarization processing; use OTSU algorithm to implement binarization processing...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a simple and effective incomplete table identification and cross-page splicing method, which comprises the following steps of: correcting an incomplete table through straight line extraction and detection, obtaining an accurate position of the incomplete table by adopting a proposed incomplete table positioning and identification algorithm, and performing cross-page splicing, cell segmentation and the like on the incomplete table when an upper page and a lower page are incomplete. The method includes: firstly, adopting morphological projection to achieve straight line detection and form correction; performing corner detection by using Harris, pre-framing a region of interest, and judging whether a straight line exists in the pre-framed region or not to obtain an accurate table region; secondly, carrying out complete state detection on the head table and the tail table in the determined table area and carrying out cross-page splicing according to a splicing rule;and finally, performing cell segmentation on the segmented table, and achieving operations such as OCR identification and table digital reproduction database storage by using CRNN. According to the method, the special incomplete table can be effectively identified, cross-page splicing can be carried out, and the use method is simple and effective.

Description

technical field [0001] The invention relates to the field of form recognition and deep learning, in particular to a simple and effective method for identifying incomplete forms and splicing across pages. Background technique [0002] With the continuous development of image processing technology and optical character recognition (OCR) technology, automatic processing of form information has become the key to many organizations building information systems. [0003] Table documents are an important carrier of business data, and it is of great significance to realize automatic data collection by studying how to realize automatic extraction of table document images. At present, most of the paper forms stored as electronic forms are manually entered, which has problems such as heavy workload, tediousness, and low accuracy. At the same time, the recognition effect of special incomplete and incomplete forms is not good, and the cross-page forms cannot be completely spliced. Cont...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/34G06K9/46G06K9/32G06N3/04
CPCG06V30/412G06V10/25G06V10/267G06V10/44G06V30/10G06N3/045
Inventor 吕志刚李亮亮王鹏高武奇岳鑫李晓艳郭翔宇李超
Owner XIAN TECHNOLOGICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products