Unlock instant, AI-driven research and patent intelligence for your innovation.

Information extraction method and device

An information extraction and document technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem that information extraction technology cannot meet the needs of practical applications.

Active Publication Date: 2013-05-08
重庆浪潮政务云管理运营有限公司
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The invention provides an information extraction method and device, which solves the problem that the existing information extraction technology cannot meet the needs of practical applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method and device
  • Information extraction method and device
  • Information extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] Existing information extraction technologies are difficult to simultaneously meet the requirements of high recall rate and accuracy rate, large amount of extracted information, light user burden and irrelevant application fields in automatic web page information extraction.

[0052] In order to solve the above problems, embodiments of the present invention provide an information extraction method and device. Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

[0053] First, Embodiment 1 of the present invention will be described with reference to the accompanying drawings.

[0054] An embodiment of the present invention provides an information extraction device, the structure of which is as follows figure 1 shown, including:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an information extraction method and a device, and relates to the field of computer application. The problem that an existing information extraction technology cannot meet requirements of actual application is solved. The information extraction method comprises the steps: carrying out pretreatment on a hypertext markup language (HTML) document and obtaining a normative extensible hypertext markup language (XHTML) document; analyzing the XHTML document and obtaining a sample example; learning the sample example through induction, and obtaining a public extensible hypertext markup path language (XPATH); generating an extensible stylesheet language transformation (XSLT) extraction rule; and carrying out information extraction through an output file function according to the XSLT extraction rule and the XPATH. The information extraction method and the device with the technical scheme adopted are suitable for feature analysis based on a webpage structure, and the information extraction with high recall ratio and high precision ratio is achieved.

Description

technical field [0001] The invention relates to the field of computer applications, in particular to an information extraction method and device. Background technique [0002] With the rapid development of information industry and communication technology, the Internet has become an important knowledge base and information source. However, with the increasing amount of information on the Internet, the organization types of data in the Internet are more diverse, and the information lacks unified management, the demand for efficient information extraction technology is becoming more and more urgent. [0003] Based on the method of analyzing the structural characteristics of web pages, the idea of ​​statistical clustering is adopted, and the recall rate is high, but it has a certain blindness when extracting information, and often extracts a large amount of useless information. The information extraction method based on the Hidden Markov Model (HMM) requires experts in related...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 高滨刘正伟高飞
Owner 重庆浪潮政务云管理运营有限公司