Statistical machine learning-based internet hidden link detection method

A statistical machine learning and dark chain detection technology, applied in the field of network technology and search, can solve the problems of weak identification of hidden methods, missed detection, and inability to automatically respond to hidden methods, and achieve the effect of effective detection
CN104239485AActive Publication Date: 2014-12-24CHINA INTERNET NETWORK INFORMATION CENTER

Patent Information

Authority / Receiving Office
CN ยท China
Current Assignee / Owner
CHINA INTERNET NETWORK INFORMATION CENTER
Publication Date
2014-12-24

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a statistical machine learning-based hidden link detection method. The method comprises the following steps: (1) collecting real webpage source code data as a training set for a classification model, and dividing the data into a category containing hidden links and a category containing no hidden links; (2) extracting anchor texts, i.e., character contents of link fields, from Html source code files of all the collected webpages of the two categories respectively, then segmenting the anchor texts into single words; (3) vectoring the two categories of texts which are subjected to word segmentation; (4) performing dimension reduction processing on a vector corresponding to each text; (5) training the two categories of data obtained in the step (4) by using a classifier to obtain a classification model; (6) applying the obtained classification model to an unknown webpage to be detected to obtain a hidden link detection result. Whether a webpage contains the hidden link or not is effectively and automatically detected by using the source code of the webpage, so that theoretical and practical support can be provided for a search engine to crack down network cheating.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the field of network technology and search technology, and in particular relates to a method for detecting dark links on the Internet based on statistical machine learning. Background technique

[0002] As an important entrance to the Internet, search engines have become an indispensable tool for netizens every day, and the ranking of search results is very important for the presentation of search results. Search engines have special algorithms (such as Google's PageRank, etc.) to measure the relative importance of web pages, and use this to determine the ranking of search results. Since search engines use "crawlers" to grab webpage content along the links between webpages, in most algorithms to measure the importance of webpages, the external links of webpages are an important factor, that is, the more links from external websites pointing to the target webpage, The higher the weight value of the landing page, the easier it i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More