Network link topology reconstruction method based on content

A technology of network linking and topology reconstruction, applied in network data retrieval, network data indexing, special data processing applications, etc., can solve problems such as difficulty in finding spam web pages, ignoring web page text information, etc., to overcome possibilities and improve efficiency Effect

Inactive Publication Date: 2016-09-07
TIANJIN UNIV
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] However, the TrustRank algorithm only considers the information of links between web pages, and ignores the text information in web pages.
Due to the increasing number of cheating methods, it is becoming more and more difficult to find spam webpages based solely on link relationships, and link information is not effective for all cheating webpages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network link topology reconstruction method based on content
  • Network link topology reconstruction method based on content
  • Network link topology reconstruction method based on content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The content-based network link topology reconstruction method of the present invention will be described in detail below with reference to the embodiments and the accompanying drawings.

[0027] The content-based network link topology reconstruction method of the present invention adds webpage content analysis on the basis of the TrustRank algorithm, and reconstructs the network link topology from the perspective of content, which can improve the efficiency of detecting and identifying spam webpages.

[0028] Such as figure 1 As shown, the content-based network link topology reconstruction method of the present invention comprises the following steps:

[0029] 1) Eliminate redundant and irrelevant feature attributes from the aspect of content features and link features, and combine the new feature vector feature;

[0030] 2) Calculate the similarity between two connected webpages, determine the correlation between the two connected webpages, the closer the similarity, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A network link topology reconstruction method based on content includes: eliminating redundancy and uncorrelated feature attributes from a content feature aspect and a link feature aspect, and combining a new feature vector; calculating the similarity between two connected web pages, and determining the correlation between the two connected web pages; eliminating spamming links according to the correlation of the web pages so as to obtain a weight calculation formula, wherein the similarity between the two pages connected by a link with a high weight is high in topology; and regenerating a network link topology according to the weight of the link. Based on the Trust Rank algorithm and web page analysis, the spamming links can be recognized through a similarity distance between the web pages and the number of the spamming links, the network link topology can be reconstructed from the content aspect, and then it can be effectively overcome that the web page detection method based on links ignores the possibility of the spamming links, and the efficiency of detecting and recognizing spamming web pages can be improved.

Description

technical field [0001] The invention relates to a network link topology reconfiguration method. In particular, it relates to a content-based network link topology reconstruction method for optimizing a webpage sorting algorithm and further effectively identifying and detecting spam webpages. Background technique [0002] The essence of detecting and identifying spam web pages is to sort the web pages in the network. Currently widely used webpage ranking algorithms include HITS, PageRank, BadRank and TrustRank algorithms. [0003] There are two basic assumptions in the HITS algorithm. The first is that a good authoritative page will be pointed to by many good centrality pages, and the second is that a good centrality page will point to many good authoritative pages. According to the execution process of the HITS algorithm, after the user enters keywords in the search engine interface, the algorithm calculates two values ​​for the returned matching page, one is the pivot val...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 喻梅高洁于健王建荣徐天一周静
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products