Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Low-frequency crawler identification method and device

An identification method and identification device technology, applied in the Internet field, can solve the problems of inaccurate identification, identification recall rate limitation, update delay, etc.

Active Publication Date: 2018-03-13
BEIJING SHU AN XINYUN TECH CO LTD
View PDF7 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The identification recall rate is limited by the coverage rate of the proxy IP library. Currently, there are hundreds of millions of Internet proxy IPs, and the mobile phone proxy IP library can only cover a small part;
[0005] (2) The proxy IP is not static, so it is necessary to update the proxy IP library frequently. Customers generally have a resistance to online updates, and offline updates will face the problem of update delays;
[0006] (3) The proxy IP obtained by using ADSL community broadband disconnection replay and multicast is more concealed, and this IP will be used by many real users, and the proxy IP library will face problems such as false sealing and inability to accurately identify

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-frequency crawler identification method and device
  • Low-frequency crawler identification method and device

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0067] Collect the network application logs of each user IP in a certain month, and calculate the behavior feature vector of each user IP in this month. The behavior feature vector of each user IP is clustered to obtain two clusters.

[0068] The inspection rules include: determining the three target behavior characteristics are the largest similarity ratio of Referer, the ratio of request path set space, and the ratio of 2XX status codes.

[0069] The judgment logic corresponding to the maximum similarity ratio of Referer is greater than, and the threshold is 95%.

[0070] The judgment logic for the space ratio of the request path collection is greater than, and the threshold is 50%.

[0071] The judgment logic for the proportion of 2XX status codes is greater than, and the threshold is 50%.

[0072] Calculate the average value of these three target behavior characteristics of all user IPs in the two clusters respectively, and the average values ​​of these three target beha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a low-frequency crawler identification method and device. The method comprises the following steps: computing a behavior feature vector of each user IP in preset time slot according to a network application log of each user IP, clustering the behavior feature vector of each user IP to acquire multiple clusters, judging out the cluster satisfying the corresponding inspectionrule, and determining each user IP in this cluster as the crawler. The device comprises a feature computing module, a clustering module, a rule determining module, and an identification module. Through the invention disclosed by the invention, the low-frequency crawler can be effectively identified, and the problem that the traditional security product cannot identity gang threat, low-frequency threat, relevancy threat and persistent threat can be solved; the public cloud or private cloud deployment is supported, the threat identification and blocking can be performed without changing networktopology and embedding any code; the butt-joint of custom blocking interface is supported, and the normal running of the original service cannot be influenced even if the deployment environment is completely switched off under the extreme case.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a low-frequency crawler identification method and device. Background technique [0002] The Internet is filled with a large number of crawlers, and in the process of anti-crawlers, crawlers are also constantly evolving. The evolution process of crawlers includes the following three stages: primary crawlers, browser crawlers, and low-frequency crawlers. Among them, the primary crawler crawls the target page without disguising itself, and can be accurately identified by features such as user agent (User-agent), frequency, etc.; the browser crawler will use the User-agent it uses through Firefox, opera , chrome and other types of browsers are disguised, and their behavior will be similar to normal users. Browser crawlers can be identified by features such as access frequency and timeline; low-frequency crawlers use a large number of proxy IP pools to imitate ordinary users for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/06
CPCH04L63/1416H04L63/1425H04L63/145
Inventor 胡志磊刘鑫琪陈峰汪海陈哲从磊
Owner BEIJING SHU AN XINYUN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products