Supercharge Your Innovation With Domain-Expert AI Agents!

Crawler recognition method, device and equipment and computer storage medium

A crawler and algorithm technology, applied in the field of computer security, can solve the problems of inability to identify, low crawler identification accuracy, etc., and achieve the effect of improving the identification accuracy.

Active Publication Date: 2020-11-27
RIVER INFORMATION TECH SHANGHAI CO LTD
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This has caused the existing methods of identifying crawlers to have a very low recognition accuracy for such crawlers, or even fail to recognize them.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler recognition method, device and equipment and computer storage medium
  • Crawler recognition method, device and equipment and computer storage medium
  • Crawler recognition method, device and equipment and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] figure 1 An exemplary system architecture of the method or apparatus for identifying crawlers in the embodiments of the present invention can be applied.

[0076] like figure 1 As shown, the system architecture may include a terminal device 101 , a network 102 and a server 103 . The network 102 is used as a medium for providing a communication link between the terminal device 101 and the server 103 . Network 102 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

[0077] A user can use a terminal device 101 to interact with a server 103 through a network 102 . Various applications may be installed on the terminal device 101, such as voice interaction applications, web browser applications, communication applications, and the like.

[0078] The terminal device 101 may be any terminal device, including but not limited to a smart phone, a smart tablet, a notebook computer, a PC, a smart wearable devic...

Embodiment 2

[0081] figure 2 It is a flow chart of a method provided by the embodiment of the present application. The execution body of the method flow can be a device for identifying crawlers. The device can be an application located on the server side, or it can also be a plug-in or software development tool in an application located on the server side. A functional unit such as a software development kit (Software Development Kit, SDK) may also be located in a terminal device with relatively strong computing capabilities, which is not particularly limited in this embodiment of the present invention. like figure 2 As shown in , the method may include the following steps:

[0082] In 201, an access path sequence of a user within a preset period is acquired.

[0083] After the user visits the website, the server may record the user's access log, which may include information such as the user's access path and access time. In this step, the user's access paths during the preset time p...

Embodiment 3

[0132] image 3 Schematic diagram of the structure of the device for identifying reptiles provided by the embodiment of the present application, such as image 3 As shown in , the device may include: a path acquisition unit 01 , a sequence division unit 02 , a pattern mining unit 03 and a crawler identification unit 04 , and may further include: a preprocessing unit 05 . The main functions of each component unit are as follows:

[0133] The path acquisition unit 01 is configured to acquire the user's access path sequence within a preset period of time.

[0134] The preprocessing unit 05 is configured to preprocess the access path sequence, and provide the preprocessed access path sequence to the sequence division unit 02 .

[0135] Wherein, the above-mentioned pretreatment includes at least one of the following:

[0136] Delete non-actively requested access paths in the access path sequence;

[0137] Merge adjacent and identical access paths in a sequence of access paths. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a crawler recognition method, device and equipment and a computer storage medium. The method comprises the steps of obtaining an access path sequence of a user in a preset timeperiod; dividing the access path sequence into more than one sub-sequence; utilizing a sequence pattern mining algorithm to take each sub-sequence obtained after division as a sequence data set to carry out frequent sequence pattern mining to obtain a frequent sequence; and judging whether the obtained frequent sequence conforms to crawler characteristics or not, and if so, determining that the user is a crawler. According to the method, the crawler which simulates user operation but circularly accesses some path sequences can be effectively recognized, so that the crawler recognition accuracy is improved.

Description

【Technical field】 [0001] The present application relates to the technical field of computer security, in particular to a method, device, equipment and computer storage medium for identifying reptiles. 【Background technique】 [0002] This section is intended to provide a background or context to the implementations of the application that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section. [0003] Crawlers are a way to obtain website information in batches using any technical means. On the one hand, a large number of crawlers will seriously occupy server performance and bandwidth, affect normal user access, and cause DDoS (Distributed denial of service attack, distributed denial of service attack) in severe cases. On the other hand, the important information and information property of the website cannot be leaked casually. If it is easily stolen, it will cause serious losses. Therefore, a corresponding anti-re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/06H04L29/08G06F16/2458
CPCH04L63/1416H04L63/1425H04L67/02G06F16/2465G06F16/2474H04L67/535
Inventor 余燕李华君姜帆刘国平
Owner RIVER INFORMATION TECH SHANGHAI CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More