Method for ranking web pages on basis of hyperlink source analysis

A web page ranking and hyperlink technology, applied in the field of information retrieval, can solve problems such as web page cheating

Inactive Publication Date: 2013-02-06
JILIN UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Aiming at the problem that webpages with a single source of hyperlinks may be suspected of cheating, the present invention proposes a webpage ranking method based on hyperlink source analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for ranking web pages on basis of hyperlink source analysis
  • Method for ranking web pages on basis of hyperlink source analysis
  • Method for ranking web pages on basis of hyperlink source analysis

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0082] Example 1: Comparative analysis of the present invention and 4 kinds of existing algorithms based on artificial network to suppress the effect of web page cheating

[0083] The experimental data is a synthetic scale-free network. The network is generated using the BA model (Barabási-Albert model). The model parameters are shown in Table 1. The generated network contains 100 nodes and 1098 edges, and the network diameter is 4.

[0084] Table 1 Parameter settings of BA model

[0085] Initial number of nodes

5

The probability that an edge exists between the initial nodes

0.3

node average degree

10

The total number of nodes in the network

100

[0086] The experiment chooses the following two commonly used cheating methods to detect the effect of the algorithm to suppress cheating:

[0087] (1) Link exchange cheating: Set up several nodes in the network as cheating nodes, and these nodes add links to each other t...

example 2

[0100] Example 2: Comparative analysis of the present invention and 4 kinds of existing algorithms based on actual network data to suppress the effect of web page cheating

[0101] The experimental data adopts the WEBSPAM-UK2007 data set provided by Yahoo Labs. There are a total of 114,529 web pages and links under the website in the data set. Volunteers have marked some websites as "non-cheating" or "cheating" at the host level. The specific information is shown in Table 3. This experiment uses a host-level network for experiments. If a page in one website points to a page in another website, then there is a directed edge between the two website hosts. Because the TrustRank, DiffusionRank and AIR algorithms all need seed sets, some of these artificially marked "non-cheating" websites are used as seed sets for these algorithms. The remaining part of "non-cheating" sites and sites with domain names such as gov, ac, mod, nhs, sch, etc. together constitute the collection of auth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for ranking web pages on the basis of hyperlink source analysis mainly includes steps of computing k-adjacent sets of each web page; computing similarities of inbound link sources of each pair of web pages; computing a hyperlink weight matrix of a world wide web; computing the authority of each web page; and ranking the web pages according to the authorities of the web pages. The novel method is used for ranking the web pages on the basis of links. Compared with existing similar methods, the method has the main advantages that computation is efficient and feasible; parameters are few and are easy to set, and a seed page set is omitted; and the method is excellent in performance in the aspects of searching for quality pages and suppressing cheating of web page ranking.

Description

technical field [0001] The invention belongs to the field of information retrieval, in particular to a method for calculating the ranking of webpages based on hyperlink analysis. Background technique [0002] With the rapid development of the Internet, the amount of information on the Internet is also growing explosively. Most of the users need to rely on the help of search engines to find helpful information from massive resources. According to the user's search needs, the search engine can find information related to the demand from the Internet and return it to the user. Through the statistics of a large number of user behaviors, among the results returned by search engines for users, users are only interested in the content of the first few pages. Therefore, search engines all have a page ranking algorithm for sorting the results to be returned to the user, and then return the sorted results to the user. Its purpose is to rank the most valuable web pages at the top, s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨博李剑楠
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products