Unlock instant, AI-driven research and patent intelligence for your innovation.

Triad mining method and device of website

A mining device and triplet technology, applied in the Internet field, can solve problems such as a large amount of labor costs, low reusability of extraction templates, and low efficiency of triplet mining

Active Publication Date: 2014-11-26
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that the reusability of the extraction template written for each website is low, and the template needs to be specially written for each website. Therefore, the triple mining efficiency for the web pages of the website is low and requires a lot of labor costs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Triad mining method and device of website
  • Triad mining method and device of website
  • Triad mining method and device of website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of exemplary embodiments of the invention as defined by the claims and their equivalents. The description includes various specific details to facilitate understanding and these descriptions are to be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

[0020] Such as figure 1 As shown, firstly, in step S101, all webpages of a website are collected, and the anchor text of each hyperlink in all webpages and the webpage address URL pointed to by each hyperlink are counted.

[0021] Next, in step S103, the occurrence frequency of the hyperlink anchor text in the webpage correspo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a triad mining method and device of a website. The method comprises the steps of collecting all web pages of the website, and counting anchor texts of hyperlinks of all the web pages and URLs indicated by the hyperlinks; calculating the occurrence frequency of the hyperlink anchor texts in the web pages corresponding to the URLs, and determining the hyperlink anchor texts with the total occurrence frequency lager than a preset standard as principal entities of the corresponding web pages; extracting templates of the web pages with triads of the determined principal entities as seed triads; matching the obtained templates with the other web pages of the website to extract new triads; with the new extracted triads as seed triads, repeating the operations of template extraction, webpage matching and new triad extraction till no new triad can be extracted from the web pages of the website.

Description

technical field [0001] The present invention relates to the technical field of the Internet, and more specifically, to a method for mining triples of web pages of external websites and a device for mining triples. Background technique [0002] In the field of Internet search, it is usually necessary to obtain the triplet (entity-attribute name-attribute value) of the webpage content of the website. However, in the prior art, it is necessary to write an extraction template to manually extract triples from each web page of the website. The disadvantage of this method is that the extraction template written for each website has low reusability, and the template needs to be specially written for each website. Therefore, the triple mining efficiency for the web pages of the website is low and requires a lot of labor costs. Contents of the invention [0003] One aspect of the present invention is to provide a method for automatically mining triples of a website, which does not ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 李永强
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD