Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for extracting object and web page from a plurality of web pages

A web page and object technology, which is applied in the field of information processing and information extraction, can solve the problems of inapplicability, need training data, and cannot extract object attribute values ​​and related web pages at the same time, so as to achieve the effect of improving performance and improving the extraction effect.

Active Publication Date: 2012-11-07
RICOH KK
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Also, this method still requires training data
[0006] It can be seen that the existing work is not applicable in some scenarios, and the existing work generally treats web page selection and attribute value extraction as separate tasks, and cannot extract object attribute values ​​and related web pages at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting object and web page from a plurality of web pages
  • Method and device for extracting object and web page from a plurality of web pages
  • Method and device for extracting object and web page from a plurality of web pages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of exemplary embodiments of the present invention as defined by the claims and their equivalents. While including various details to facilitate understanding, they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

[0032] The term "attribute-value pair" used in the description of the present invention refers to an attribute name of an object and its corresponding attribute value. For example, for the country of the United States (described in a web page, such as http: / / en.wikipedia.org / wiki / United_States), "current president" is the attribute name, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for extracting an object and a web page from a plurality of web pages. The method comprises the following steps: identifying candidate attribute value pairs of a plurality of web pages; as to each web page, structuring an attribute value chart in the page for the candidate attribute value pairs in the web page; as to each web page, structuring the attribute value chart between the pages for the candidate attribute value pairs in the other pages; structuring the web page chart for the plurality of web pages; calculating each candidate attribute value pair and the score of each web page, and selecting the object and the web page.

Description

technical field [0001] The present invention generally relates to the fields of information processing and information extraction, and more specifically relates to extracting information and related web pages from multiple web pages. Background technique [0002] Currently, there are a large number of electronic documents, for example, various articles describing products on the Internet. Information processing, analysis, statistics, etc. of various documents are becoming a research and development hotspot in the industry. [0003] For a large number of webpages containing object attribute value information on the Internet, such as product parameter webpages, automatically extracting object attribute value information from the webpage can be used to better construct the index of such webpages, thereby facilitating search, or can Use the extracted results for comment mining and trend analysis. There is already some existing work on this task. [0004] US Patent No. 7720830...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙军谢宣松姜珊珊赵利军郑继川
Owner RICOH KK