Unlock instant, AI-driven research and patent intelligence for your innovation.

Web crawler interception method and device, electronic equipment and readable storage medium

A network crawler and terminal equipment technology, applied in the field of network security, can solve the problems of cumbersome operation steps, heavy maintenance workload of protection software upgrades, etc., and achieve the effect of simple process, improved network security and fast speed

Pending Publication Date: 2022-02-11
CHINANETCENT TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the CDN network, there are many edge nodes, each edge node is independent of each other, the workload of protection software upgrade and maintenance is heavy, and the operation steps are cumbersome

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler interception method and device, electronic equipment and readable storage medium
  • Web crawler interception method and device, electronic equipment and readable storage medium
  • Web crawler interception method and device, electronic equipment and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0038] At this stage, the robot agreement is an agreement between the website and the web crawler, and the robot agreement is usually a simple text in txt format. In the protection based on the robot protocol, the web crawler is informed of the allowed permissions through a simple text format in txt format. When a web crawler visits a site, it first checks whether there is a robots.txt file in the root directory of the site. If it exists, the web crawler will determine the scope of access according to the contents of the file; The crawler will be able to access all pages on the website that are not password protected.

[0039] However, some malicious web crawlers do not abide by the robot protocol, and the aforementioned robot p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web crawler interception method and device, electronic equipment and a readable storage medium. The method comprises steps that an edge node generates an access log according to an access request and sends the access log to a buffer after receiving the access request from first terminal equipment every time; and a computing cluster reads the first access log in the message queue in real time, reads a plurality of access logs containing domain names from the message queue according to the domain names contained in the access logs, and then determines whether an access request corresponding to the first access log is a malicious request or not according to the plurality of access logs. In the process, because the access log in the message queue comes from the edge node of the whole netowrk, the computing cluster analyzes the data of the whole network, and malicious web crawlers can be quickly and accurately identified. If a new web crawler exists, only an analysis model on the computing cluster needs to be updated, the protection software of each edge node does not need to be upgraded, the speed is high, and the process is simple.

Description

technical field [0001] The present application relates to the technical field of network security, in particular to a method, device, electronic equipment and readable storage medium for intercepting web crawlers. Background technique [0002] A web crawler, also called a web spider, usually searches for a uniform resource locator (Uniform Resource Locator, URL) of a web page according to the address of the web page, and then crawls the content of the website according to the URL. [0003] In order to prevent web crawlers from crawling website content, the industry regulates the behavior of web crawlers through robot protocols, which are also called crawler protocols, robots protocols, and the like. However, some malicious web crawlers do not abide by robot protocols, and traditional robot protocols cannot block such malicious web crawlers. To this end, in a Content Delivery Network (CDN), protection software is deployed on edge nodes, and the edge nodes are used to detect ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/951G06F16/955
CPCG06F16/951G06F16/9566
Inventor 吴伟彬黄林城
Owner CHINANETCENT TECH