web crawler detection method based on access log IP analysis

A technology of web crawler and detection method, applied in the direction of network data retrieval, network data index, other database retrieval, etc., can solve the problems of false interception and normal user's false interception.

Pending Publication Date: 2019-04-19
成都知道创宇信息技术有限公司
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Moreover, this method is also prone to false interception, such as company exit IP, community exit IP, etc. An IP does not necessarily repre

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • web crawler detection method based on access log IP analysis
  • web crawler detection method based on access log IP analysis
  • web crawler detection method based on access log IP analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described in detail below in conjunction with the drawings.

[0038] The process of the present invention is as figure 1 As shown, specifically:

[0039] 1. Use the feature detection method to detect the features in the access request packet to determine whether it is a common crawler, and if the recognition is successful, determine that the IP belongs to a web crawler.

[0040] First obtain the UserAgent field in the access request, and check whether the UserAgent contains automated program features, including python, ruby, PhantomJS, pycurl, httpunit, Wget, and Java. If the above keyword features are detected, it is judged as a crawler.

[0041] Note: The above feature keywords are collected from the UserAgent of a common automated program. The tools that can initiate HTTP requests in the technical field are usually well-known to technical personnel, so it is not difficult to collect the features of these tools. If you encounter a new too...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web crawler detection method based on access log IP analysis. The method comprises the specific steps of using a feature detection method for detecting features in an accessrequest data packet to judge whether an access request data packet is a common crawler or not; Using an access behavior detection method to detect the proportion of the IP to the static resources andthe dynamic resources so as to judge whether the IP is an advanced crawler or not; Detecting the page view of the website interface by using a special crawler detection method to judge whether the website interface is a crawler or not; Outputting a judgment result; According to the method, the IP is recognized through three detection methods, common crawlers, advanced crawlers and special crawlerscan be covered, effective crawler recognition can be conducted in a larger range, the false alarm rate can be controlled by adjusting parameters in the detection process, and the actual work requirement is better met.

Description

Technical field [0001] The invention relates to the field of web crawler detection, in particular to a web crawler detection method based on access log IP analysis. Background technique [0002] With the development of the Internet, more and more industries have begun to display their main businesses and data to the majority of netizens in the form of websites, and web crawlers can automatically obtain these data, so that crawler owners can use these data Profit, for example, some people crawl the product information of the e-commerce website by writing a crawler program. From these data, the price of each product can be obtained. As a competitor, you can use this price as a reference to appropriately reduce the price of your shopping mall. The price of this product, thereby maintaining a sales advantage. Or for some authoritative information, such as corporate credit information query, these data can only be queried through government websites, and crawler writers can obtain da...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951
Inventor 仲俊霖
Owner 成都知道创宇信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products