Unlock instant, AI-driven research and patent intelligence for your innovation.

Web page dark link detection method, device and computer-readable storage medium

A dark link detection and web page technology, applied in the network field, can solve problems such as large noise influence, easy false positives, rough classification, etc., and achieve the effect of improving classification granularity and accuracy

Active Publication Date: 2021-05-04
SANGFOR TECH INC
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is too rough for the classification of sample data, the influence of noise is large, and it is prone to false positives.
For example, there is a type of dark link implantation in the medical category on the Internet. Most of the anchor texts of these links are medical terms such as "psoriasis". For positive samples, medical websites should contain such words, while for negative samples , many of the dark links that have been detected are also implanted in the medical category. At this time, the technology of binary classification is easy to cause misjudgment
In addition, this method ignores the hidden characteristics of dark links, and it is easy to judge some open links (such as "friendship links") as dark links, resulting in false positives

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page dark link detection method, device and computer-readable storage medium
  • Web page dark link detection method, device and computer-readable storage medium
  • Web page dark link detection method, device and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0121] In this embodiment, the K-NN algorithm is used to calculate the text similarity between the webpage to be detected and the webpages in the training set, and then determine the adjacent vectors of the text feature vectors of the webpage to be detected. As an implementation manner, the above step S31 may include:

[0122] Step S311, calculating the cosine of the angle between the text feature vector of the webpage to be detected and the text feature vector of the webpage in the training set;

[0123] Step S312, taking the calculation result as the text similarity between the webpage to be detected and the webpage in the training set.

[0124] Suppose the text feature vector of a webpage in the training set is D0(W01, W02,...,W0m), and the text feature vector of the webpage to be detected is Dk(Wk1, Wk2,...,Wkm), then the webpage to be detected and the text feature vector in the training set The formula for calculating the text similarity of web pages is:

[0125]

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page dark link detection method, comprising: obtaining a training set of a preset web page classification model; obtaining text feature vectors of web pages to be detected and text feature vectors of web pages in the training set; selecting several A text feature vector of a webpage that meets the preset similarity condition with the text feature vector of the webpage to be detected is used as the adjacent vector of the text feature vector of the webpage to be detected, and the webpage type corresponding to the adjacent vector is determined according to the webpage classification model as the webpage to be detected the candidate webpage types; respectively calculate the probability that the webpage to be detected belongs to each candidate webpage type; determine whether the candidate webpage type corresponding to the maximum probability belongs to a positive sample or a negative sample; if it belongs to a negative sample, it is determined that the webpage to be detected contains dark chain. The invention also discloses a web page dark link detection device and a computer-readable storage medium. The invention can improve the accuracy rate of web page dark link detection.

Description

technical field [0001] The invention relates to the field of network technology, in particular to a method, a device and a computer-readable storage medium for detecting dark links in webpages. Background technique [0002] Dark links, also known as "black links" and "hidden links", refer to external links that are invisible but can be recognized and calculated by search engines. For the purpose of ranking the website in search engines and profiting from it, the implantation of dark links not only affects the normal operation of the website, but also disseminates a large amount of illegal information to the public, causing great harm. At present, there are mainly two methods for web page dark link detection: [0003] 1) Rule-based dark link detection: use the identification of hidden technology and combine the feature blacklist to determine whether a web page is implanted with dark links. This method is weak in identifying some hidden methods of dark links, and at the same...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/958G06F16/35
CPCG06F16/35G06F16/958
Inventor 刘毅
Owner SANGFOR TECH INC