Method and device for identifying table in PDF document

A table and document technology, applied in the field of extracting tables in PDF documents, can solve the problem of low recognition accuracy

Pending Publication Date: 2022-02-18
深圳价值在线信息科技股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, there are a large number of forms in some PDF prospectuses, and there is a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for identifying table in PDF document
  • Method and device for identifying table in PDF document
  • Method and device for identifying table in PDF document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

[0027] It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other Presence or addition of features, wholes, steps, operations, elements, components and / or collections thereof.

[0028] It should also be u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of data processing, and provides a method for identifying a table in a PDF document, which comprises the following steps: acquiring an original target page of the PDF document; when it is recognized that no conventional table exists in the original target page, or when the recognition result of the conventional table in the original target page does not meet a preset condition, recognizing whether a special table exists in the original target page or not through a special table recognition module, and the conventional table referring to a table with crossed lines; the special table being a table without cross lines, and the special table identification module is a module for identifying the table without cross lines. According to the method, the identification accuracy of the table in the PDF document can be improved.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a method and device for extracting tables in PDF documents. Background technique [0002] As we all know, Portable Document Format (PDF) has been widely used in various industries such as finance, IT, electronics and education. Every industry will accumulate a large amount of PDF documents, and these PDF documents record a large amount of information such as text, pictures, and tables. In some cases, it is necessary to extract some key information from PDF documents, which may be text or tables. For example, there are a large number of forms in some PDF prospectuses, and there is a method for identifying the forms in the prior art, but the recognition accuracy is not high. [0003] Therefore, how to improve the recognition accuracy of tables in PDF documents is an urgent problem to be solved at present. Contents of the invention [0004] The present application provi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06V30/148G06V30/40
Inventor 马英峰宋雨生王童萱冯冉周敏
Owner 深圳价值在线信息科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products