Check patentability & draft patents in minutes with Patsnap Eureka AI!

Multi-view web spam detection method

A detection method and multi-view technology, applied in the field of Internet information retrieval, can solve the problems of reduced classification accuracy, sensitive training data imbalance, inability to detect multiple different spam pages at the same time, and time complexity of detection, so as to avoid the impact , the effect of improving efficiency

Inactive Publication Date: 2014-01-22
SHANDONG NORMAL UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when the data dimension is large, there are the following problems: it is sensitive to the imbalance of training data, it cannot detect a variety of different spam pages at the same time, and the detection time complexity is high.
Studies have shown that once there is a large difference in the number of various types, especially for two types of problems, when the amount of one type of data is much larger than the amount of another type of data, the classifier obtained through learning will be more effective for the type with a small amount. (Minority class) classification accuracy will be greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-view web spam detection method
  • Multi-view web spam detection method
  • Multi-view web spam detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0046] The purpose of the present invention is to provide a general detection method for various spam pages.

[0047] To achieve the above object, the technical solution of the present invention is: a method for multi-view representation of page features is proposed, which is different from traditional page feature representation methods. The method adopts two views to represent a page, and the two views represent that for the same web page, both a content-based feature vector representation (called a content view) and a hyperlink-based feature vector representation (called a content view) are used to represent the same web page. is the link view), that is, one page corresponds to two views, which are called the content view and the link view respectively. The training data refers to page data that has been clearly marked as normal and spam. The conte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-view web spam detection method. The method comprises the following steps of: firstly, obtaining two views of all normal pages and spam pages in the training data; then, two views of a page to be detected; respectively establishing a matrix for the obtained two views; resolving a normal norm and a spam norm; comparing the normal norm with the spam norm; if the normal norm is less than the spam norm, identifying the page to be detected as a normal page; if the normal norm is more than the spam norm, identifying the page to be detected as a spam page; if the normal norm is equal to the spam norm, randomly identifying the page to be detected as the normal page or the spam page. The method has the advantages of being insensitive to the unbalancedness of the training data, being capable of simultaneously detecting various spam pages, being simple in the detection process, and the like.

Description

technical field [0001] The invention relates to a multi-view network garbage page detection method, which belongs to the field of internet information retrieval. Background technique [0002] In order to obtain commercial benefits, some website owners use improper means to deceive the search engine website ranking algorithm, so that unimportant websites or pages are ranked higher, destroying the engine search results. Its related technologies include search engine optimization (SEO) and search engine marketing (SEM), which are collectively referred to as search engine spam, that is, Web spam (Internet spam). At present, Web spam has become an important challenge for various Web searches, seriously affecting the effect of information retrieval. At the same time, Web spam is developing rapidly, and new spamming technologies are constantly emerging. There are three main manifestations of Web spam: content-based, link (link) and page hiding. Currently, methods for detecting sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 张化祥
Owner SHANDONG NORMAL UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More