Method and system for arranging web page again

A web page and coefficient technology, applied in the field of web page reordering method and reordering system, can solve problems such as low efficiency of web page reordering, and achieve the effects of improving reordering efficiency, reducing calculation amount, and narrowing the scope.

Active Publication Date: 2008-07-23
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The technical problem to be solved by the present invention is to provide a webpage sorting method to solve the problem of low efficiency of webpage sorting in the prior art. This sorting method has high efficien

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for arranging web page again
  • Method and system for arranging web page again
  • Method and system for arranging web page again

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0040] A webpage contains a hyperlink (URL) pointing to another webpage, and it is considered that there is a link relationship between the two webpages, and the text on the hyperlink is the anchor text. If webpage A uses anchor text S to link to webpage B, the link is a forward link for webpage A and a backlink for webpage B. There may be multiple forward and backlinks for each web page. Forward links and anchor texts can collectively reflect the link relationship between the webpage and other webpages. The webpages with the same or similar link relationship generally have the same or similar content. Therefore, the present invention uses the forward link and anchor text in the webpage as the basis for judging duplicate webpages...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method to detect duplicate web pages, which comprises: obtaining forward links information of each web page on the internet and removing guidance links and back-leading links in the forward links information of each web page, comparing the forward links information of each web page and extracting the web pages whose number of same forward links is over a threshold value, forming a duplicate web pages set by the extracted web pages and eliminating duplicate web pages based on the duplicate web pages set. The method of the invention can compute scores of web pages which contain the same forward links according to properties of the same forward links and excludes web pages whose scores differences are within a certain value. The method of the invention also computes quality values of web pages and keeps the web pages whose quality values are larger than a set threshold value and then computes web pages signatures and excludes the web pages whose similarity degrees of signatures are over a threshold value. Simultaneously, the invention also discloses a duplicate web pages detection system and solves problems of low efficiency of the prior art, and has higher efficiency, precision and accuracy.

Description

technical field [0001] The invention relates to the field of web page weight sorting, in particular to a web page weight sorting method and system. Background technique [0002] With the rapid development of Internet technology, there are more and more web pages on the Internet. According to statistics, there are more than 10 billion Chinese web pages, and about 70% of them are duplicate web pages. Duplicate webpages refer to webpages with the same substantive content, for example, webpages with exactly the same display content; webpages with the same body content but different titles; webpages with the same body content but different auxiliary content, etc. Duplicate webpages account for a very large proportion of Internet webpages. How to effectively remove duplicate webpages from a huge number of webpages is a difficult problem faced by search engines. At present, in the prior art, duplicate web pages are excluded by selecting feature codes in web pages and comparing the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 禹荣凌刘云峰
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products