Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and device for identifying tables in digital format documents

A format file and table technology, applied in the field of identifying tables in digital format files, can solve problems such as unrecognizable and incorrect recognition of complex tables, and achieve the effect of saving data processing costs and improving work efficiency

Active Publication Date: 2016-03-30
NEW FOUNDER HLDG DEV LLC +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a method and device for identifying forms in digital format files, which solves the problem in the prior art that complex forms cannot be identified or are identified incorrectly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for identifying tables in digital format documents
  • A method and device for identifying tables in digital format documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] An embodiment of the present invention provides a method for identifying tables in a digital format file, including: extracting straight lines in the layout, and dividing the extracted straight lines into horizontal straight lines and vertical straight lines; The vertical straight lines in the class intersect, if they intersect, the straight lines intersecting in the horizontal straight line class and the vertical straight line class are determined as intersecting straight line groups; whether the quantity of the intersecting straight line groups is detected is greater than the first threshold, if so, then determine the The first area where the intersecting line group is located is a table area; otherwise, perform a vertical projection operation on the text in the first area, and determine whether the first area is a table area according to the vertical projection result.

[0044] Such as figure 1 As shown, the embodiment of the present invention provides a method for i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The application discloses a method and device for identifying tables in digital format files, which are applied in the field of digital file processing. The method includes: extracting straight lines in the layout, and dividing the extracted straight lines into horizontal straight lines and vertical straight lines; detecting whether the horizontal straight lines in the horizontal straight line class intersect with the vertical straight lines in the vertical straight line class, and if so, dividing the horizontal straight line The intersecting straight lines in the class and the vertical straight line class are determined as intersecting straight line groups; whether the quantity of the intersecting straight line groups is detected to be greater than the first threshold, if so, then determine that the first area where the intersecting straight line groups are located is the table area; otherwise, Perform a vertical projection operation on the text in the first area, and determine whether the first area is a table area according to the vertical projection result. Using the method and device of the invention can quickly and accurately locate the form.

Description

technical field [0001] The invention relates to the field of digital file processing, in particular to a method and device for identifying tables in digital format files. Background technique [0002] In industries such as newspapers and publishing houses, after the typesetting software is used for typesetting, articles and related metadata information need to be extracted from the produced layouts for further use, that is, article information reconstruction and indexing. In order to restore the content of the layout more realistically, in addition to the content information of the article itself (such as: title, citation, subtitle, author, body and other information), the location of the required text block, font size, etc. are also extracted when indexing information. [0003] At present, when indexing digital newspapers and periodicals (that is, organizing the content information in the newspapers and periodicals, such as: labeling the layout information—publishing date,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22
CPCG06F40/177G06V30/414G06F16/21
Inventor 董宁黄文娟
Owner NEW FOUNDER HLDG DEV LLC