Method and system for identifying form in layout file

A technology for formatted documents and identification methods, which is applied in the fields of instrumentation, calculation, and electrical digital data processing. It can solve the problems of cumbersome manual processing, automatic processing, and loss of table data, and achieve efficient indexing and automation
CN101770446AActive Publication Date: 2010-07-07NEW FOUNDER HLDG DEV LLC +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NEW FOUNDER HLDG DEV LLC
Publication Date
2010-07-07

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a method and a system for identifying a form in a layout file and belongs to the technical field of mode identification in the field of computer information processing. The conventional mode identification technology cannot effectively identify and automatically extract the form of a layout. In the method and the system, firstly, independent characters of the layout are combined and organized into content blocks by utilizing automatic combination technology; and secondly, form identification and content combination are performed according to spatial positions, character information and typesetting information of the content blocks. Through the method and the system, the form can be rapidly identified at high efficiency, and the form content is accurately organized through the analysis of the position and the typesetting information of the content of the paper layout.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of pattern recognition in the field of computer information processing, and in particular relates to a form recognition method and system in format files. Background technique

[0002] In industries such as newspapers and publishing houses, after the typesetting software is used for typesetting, it is necessary to extract articles and related metadata information from the produced layouts for further use, which is the reconstruction and indexing of article information. In order to restore the content of the layout more realistically, in addition to the content information of the article itself (such as title, citation, subtitle, author, body and other information), the position of the text block, font size and other information are also extracted when indexing .

[0003] The Chinese patent application with the application number 200710179938.4 "An indexing method for complex layouts based on PDF" discloses ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More