Method for abstracting document data information appeared in newspaper

A technology of data information and data, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as loss, incomplete content of articles, and misclassification of articles, so as to improve the speed of reverse analysis, simplify manual operations, and ensure Effects of Completeness and Accuracy

Inactive Publication Date: 2007-02-14
PEKING UNIV +1
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Since the layout description files, such as PS (PostScript language defined by Adobe) and S2 (layout result description language defined by Founder) are mainly used to describe the output information of layout printing, it is meaningless for printing, but for newspaper materials Very meaningful data information, such as article paragraphs, sequence, position and title, etc. have been changed or lost, and the associated information between articles and pictures, pictures and text descriptions, etc. have also been lo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for abstracting document data information appeared in newspaper

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0047]Commonly used typesetting files, including Feiteng typesetting files, InDesign (product name) typesetting files, and QuarkXPress (product name) typesetting files. The typesetting data structure includes layout information structure and manuscript information structure.

[0048] The layout information structure includes layout information and manuscript area information. Wherein, the page information includes: newspaper name, page and column name, edition number (such as A01, first edition), group member and other information; the manuscript area information includes manuscript area location information and title area location information.

[0049] The manuscript information structure includes manuscript content information such as pictures, articles and tables located in the manuscript area, wherein the content information includes content information in the title area and content information in the manuscript area. The content information of the title area includes title te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention discloses a method for picking up data information of materials appearing in the newspaper including: picking up information of printed sheets and all manuscript regions based on the printed sheet information structure of typeset documents, picking up all manuscripts in the manuscript regions based on the manuscript information of said sheet files, picking up the association relation among manuscripts based on the position relations of these regions and merging the associated manuscripts based on said relation, sequencing them in terms of the importance of them and the region information, modifying and labeling the sheet information content and their information to get the data information of the materials appearing in the newspaper.

Description

technical field [0001] The invention relates to the field of computer information processing, in particular to the extraction technology of newspaper materials. Background technique [0002] Newspaper information is the core digital asset of a newspaper office. It includes data information including: manuscript content information, such as articles (text, paragraphs, titles, etc.) on the newspaper page, text and picture content in tables; manuscript layout information, including manuscript The location information (such as coordinate information), format information such as the font and font size of the title and text; the association information between articles and pictures, pictures and text descriptions; newspaper layout information, including newspaper edition, layout name, date, etc. The extraction of these data information is carried out after the completion of the newspaper layout process. Usually, the method of extracting the data information in the newspaper mater...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/00
Inventor 赵东岩刘万福
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products