Information extraction method and device

An information extraction and block information technology, applied in the field of information extraction, can solve the problems of low accuracy, extracting preset text block information, and heavy workload of indexers, and achieve the effect of automatic extraction.

Inactive Publication Date: 2011-06-22
PEKING UNIV FOUNDER GRP CO LTD +1
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art: the computer automatic indexing adopted in the prior art cannot obtain information from the layout text block information and the manuscript text block information of the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method and device
  • Information extraction method and device
  • Information extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] An information extraction method and device provided in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0022] Such as figure 1 As shown, it is an information extraction method provided by the embodiment of the present invention. The specific implementation process of this method is as follows:

[0023] 101: Extract text block information from the layout file, wherein the text block information includes: layout text block information and manuscript text block information; wherein, the layout file can be understood as a page of a newspaper that is reversed by an indexing software The decoded digital information. The extracting the text block information from the layout file is to extract the text block information from the digitized information of the newspaper layout.

[0024] 102: Determine whether the preset layout text block information in the text block information is extracted;

[0025] 10...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses an information extraction method and an information extraction device, relating to the technical field of information extraction, and aiming to solve the problem that in the prior art, the default text block information can not be extracted from the page information and manuscript information of the newspaper through automatic indexing. The information extraction method disclosed by the embodiment of the invention comprises the following steps: extracting text block information from a page file, wherein the text block information comprises page text block information and manuscript text block information; judging when the default page text block information in the text block information is extracted; if the default page text block information is not extracted, extracting the default page text block information; and if the default page text block information is extracted, extracting the default manuscript text block information. By using the method and device disclosed by the embodiment of the invention, the workload of the indexing personnel can be reduced, and the accuracy of indexing can be enhanced.

Description

technical field [0001] The present invention relates to the technical field of information extraction, in particular to an information extraction method and device. Background technique [0002] With the rapid development of the Internet and information technology, digital projects in the newspaper publishing industry are also competing. In the digital information process of the newspaper publishing industry, the digital information of newspaper resources has become the core digital asset of the newspaper office. The digitized information of the newspaper resources includes: manuscript information, such as articles (text, paragraphs and titles, etc.) on the newspaper layout, text and picture content in tables, etc.; layout information, including newspaper edition, layout name, date, manuscript Location information (such as coordinate information), format information such as the font and font size of the title and text, and the association information between articles and pi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/22
Inventor 林欣欣徐剑波董宁王辉
Owner PEKING UNIV FOUNDER GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products