Network crawler blacklist generation method and device and network crawler blacklist identification method and device

A web crawler and blacklist technology, applied in the field of big data and machine learning, can solve the problems of low accuracy and reliability of web crawler blacklist, avoid low accuracy, improve technical effect, improve accuracy and reliability sexual effect

Pending Publication Date: 2022-04-29
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the manual generation of the web crawler blacklist makes the web crawler blacklist susceptible to the influence of human subjective factors, resulting in the technical problem of low accuracy and reliability of the web crawler blacklist

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network crawler blacklist generation method and device and network crawler blacklist identification method and device
  • Network crawler blacklist generation method and device and network crawler blacklist identification method and device
  • Network crawler blacklist generation method and device and network crawler blacklist identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0045] In order to improve the network environment, it is usually necessary to identify web crawler traffic, and usually generate a web crawler blacklist, and filter out web crawler traffic from all traffic based on the web crawler blacklist.

[0046] Wherein, the web crawler traffic refers to the traffic formed based on the web crawler's access to the network. Web cra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a web crawler blacklist generation method and device, and a web crawler blacklist identification method and device, and relates to big data and machine learning in the technical field of artificial intelligence. The scheme comprises the steps of obtaining network access traffic of a first time period, performing classification processing on the network access traffic of the first time period according to network access traffic of a second time period, and obtaining a web crawler blacklist; the classification result of the network access traffic of the first time period represents whether the network access traffic of the first time period is network crawler traffic, and the second time period is a time period before the first time period; and according to the classification result of the network access traffic in the first time period and the classification result of the network access traffic in the second time period, the web crawler blacklist is generated, so that the defect of low accuracy caused by manual generation of the web crawler blacklist is avoided, and the accuracy and reliability of the generated web crawler blacklist are improved.

Description

technical field [0001] The present disclosure relates to big data and machine learning in the technical field of artificial intelligence, and in particular to a generation method, identification method and device of a web crawler blacklist. Background technique [0002] Web crawler traffic refers to the traffic that automatically grabs World Wide Web information according to certain rules. This traffic is not real traffic, so it can also be called cheating traffic. Identification of web crawler traffic usually relies on web crawler blacklists. [0003] In the prior art, the web crawler blacklist is usually generated manually, for example, based on the experience of technicians to determine criteria for judging web crawler traffic, and the web crawler blacklist is generated based on the criteria. [0004] However, manually generating the web crawler blacklist makes the web crawler blacklist vulnerable to human subjective factors, resulting in a technical problem of low accur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F16/951
CPCG06F16/951G06F18/241
Inventor 董奕
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products