Web crawler identification method and web crawler identification device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of web crawler and identification method, applied in the field of identification method and device of web crawler, capable of solving the problems of high false negative rate and false positive rate, large limitations, etc. Good results

Active Publication Date: 2018-08-21

TENCENT TECH (SHENZHEN) CO LTD

View PDF6 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to provide a method and device for identifying web crawlers, so as to solve the technical problems that existing crawler identification methods have relatively large limitations and high rates of missed and false positives

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

no. 1 example

[0034] This embodiment will be described from the perspective of the identification device of the web crawler, please refer to Figure 1b , Figure 1b The identification method of the web crawler provided by the embodiment of the present invention is specifically described, which may include:

[0035] S101. Generate a crawler identification instruction.

[0036] In this embodiment, the trigger condition for generating the crawler identification instruction can be determined according to actual needs, and it can be a specified time or a specified amount of data, wherein, the specified time and the specified amount of data can be set by the user, or can be It is the factory default setting when the server leaves the factory. Specifically, when the trigger condition is a specified time, the server may be triggered to generate a crawler identification instruction when the specified time is reached. When the trigger condition is the specified amount of data, it is necessary to c...

no. 2 example

[0103] According to the method described in Embodiment 1, an example will be given below for further detailed description.

[0104] In this embodiment, it will be described in detail by taking the identification device of the web crawler integrated in the server, the first terminal and the second terminal as an example.

[0105] Such as Figure 2a and Figure 2b As shown, a web crawler identification method, the specific process can be as follows:

[0106] S201. The server acquires and parses the user access request sent by the first terminal, obtains the user ID, access address and access time of the current user, and stores them in a first preset database.

[0107] For example, there are roughly two acquisition paths for the user's access request, one is obtained through the traffic bypass copy operation from the switch through the optical splitting device, and the other is obtained from the web server through the data sending queue to report real-time traffic data. The u...

no. 3 example

[0142] According to the methods described in Embodiment 1 and Embodiment 2, this embodiment will be further described from the perspective of a web crawler identification device, and the web crawler identification device may be integrated in a server.

[0143] see Figure 3a , Figure 3a The web crawler identification device provided by the third embodiment of the present invention is specifically described, which may include: a generation module 10, an acquisition module 20, a calculation module 30 and an identification module 40, wherein:

[0144] (1) Generate module 10

[0145] The generating module 10 is configured to generate a crawler identification instruction.

[0146] In this embodiment, the trigger condition for generating the crawler identification instruction can be determined according to actual needs, and it can be a specified time or a specified amount of data, wherein, the specified time and the specified amount of data can be set by the user, or can be It i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a web crawler identification method and a web crawler identification device. The web crawler identification method comprises steps that a crawler identification instruction isgenerated; according to the crawler identification instruction, a user identification set and an access time set corresponding to each user identification of the user identification set in a preset time period are acquired; an interval between two adjacent access times in the access time set is calculated to acquire an interval length set; according to the interval length set, the web crawler is identified from the user identification set, and therefore the web crawler can be identified accurately, a missing report rate and a false report rate are reduced, and identification effect is good.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for identifying a web crawler. Background technique [0002] A web crawler is a program that automatically obtains the content of web pages. For a website, a large number of requests from malicious crawlers will consume the performance of the server, and even cause the server to crash. Moreover, in industries such as literature, film and television, and e-commerce, malicious crawlers can be easily used to batch pull and copy public or Semi-public information seriously affects the security of the website server. [0003] Existing web crawler technologies can be divided into high-frequency script crawlers and collector crawlers according to the differences in crawling targets, countermeasures, and performance requirements. Among them, high-frequency script crawlers aim to obtain the site in the shortest time difference. The updated content and full amount of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04L29/06G06F17/30

CPCG06F16/951G06F16/958H04L63/10H04L63/1416H04L63/1466

Inventor 唐文韬郑云文胡珀郑兴郭晶张强范宇河王放杨勇

Owner TENCENT TECH (SHENZHEN) CO LTD

Web crawler identification method and web crawler identification device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

no. 1 example

no. 2 example

no. 3 example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology