Method and system for automatically extracting article metadata information based on word flow

A technology for metadata information and automatic extraction, applied in electrical digital data processing, special data processing applications, instruments, etc. The effect of leading speed and easy modification

Inactive Publication Date: 2010-03-17
NEW FOUNDER HLDG DEV LLC +1
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the limitations of understanding and physical strength, there will inevitably be a reduction in accuracy and speed, so the processing efficiency is low and error-prone
When processing a large amount of historical newspaper data, the processing cost will be higher

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for automatically extracting article metadata information based on word flow

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0044] An automatic extraction system for article metadata information based on text flow, including the following devices:

[0045] (1) Device for writing configuration files and script files: used to write configuration files and script files and put configuration files and script files of different publications into the configuration directory of the publication;

[0046] (2) configuration file loading device: for loading configuration files, the system reads the configuration files of each publication, and obtains the path of the script file and relevant script function information according to the information recorded in the configuration file;

[0047] (3) The device for loading the script content to the engine: for reading the script function content in the script file into the script engine and analyzing it;

[0048] (4) extracting dev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for automatically extracting article metadata information based on a word flow, and belongs to the technical field of information identification and extraction. The metadata information is usually input again, or existing word information is manually copied and pasted from a layout file in the prior art, which has low processing efficiency and is easyto cause mistakes. The method and the system adopt a way matched with a regular expression template to extract the article metadata according to characteristic information of the metadata in the wordflow. The method and the system match and automatically extract the metadata information aiming at typesetting rules of publications, only need to simply and manually verify the accuracy, and accelerate information extraction speed.

Description

technical field [0001] The invention belongs to the technical field of information identification and extraction, and in particular relates to an automatic extraction method and system for article metadata information based on text flow. Background technique [0002] When newspaper articles are processed into a database, some basic metadata information is needed for retrieval and information reuse. [0003] In the final finalized layout file after newspaper typesetting is completed, the metadata (author, source, genre, etc.) of the article has been lost, or it only exists in the form of text, and the type of metadata cannot be identified. When indexing and reprocessing the layout files, it is necessary to obtain these information again. [0004] These metadata information often exist in the text of the article, placed in a specific position throughout the article, or marked with special tags. At present, different newspapers or articles of different layouts have various la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/30
Inventor 董宁任大勇朱兴
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products