Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Crawler identification method and device for website server

A technology of website server and identification method, which is applied in the field of website server crawler identification, can solve problems such as increasing technical and business costs, increasing website operation and maintenance costs, and increasing technical risks, and achieves low operation and maintenance costs and technical risks, and is easy to upgrade and update , Portable effect

Inactive Publication Date: 2017-08-25
成都优易数据有限公司
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, on the one hand, this increases the technical and business costs of each website. On the other hand, such technical methods are often too highly coupled with the specific business of the website and are not portable.
[0004] In addition, even if the anti-crawler logic is coupled with the website business server logic, it will cause trouble in improving and upgrading the anti-crawler technology, and cannot effectively deal with new crawler technologies and situations
If the server logic is redeployed due to the anti-crawling logic upgrade, it will increase the website operation and maintenance cost and increase the technical risk

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler identification method and device for website server
  • Crawler identification method and device for website server
  • Crawler identification method and device for website server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] All the features disclosed in this specification, except for mutually exclusive features and / or steps, can be combined in any manner.

[0047] The present invention will be described in detail below in conjunction with the drawings.

[0048] Such as image 3 As shown, an implementation process of the present invention is:

[0049] S10. Used to mark the visiting user C1 with the IP address of the visiting user;

[0050] S11. According to the user request complete path and requested resource type, distinguish the same type of request access;

[0051] S12. Separate the core access request CR1 according to whether the access type is page or data, and record its visit time stamp TR1;

[0052] S13. Judging the continuity of core visits of the same type: Calculate two consecutive visit time intervals t=TR2-TR1, if t is less than the given interval ST2, it is regarded as a continuous visit and proceed to the next step, otherwise it is directly judged as a non-crawler visit and directly pe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of the Internet, and particularly relates to a crawler identification method and device for a website server. The method comprises the following processes that: on the basis of the IP (Internet Protocol) address of a user, marking an access user; dividing user access requests, classifying the user access requests with the same request resource, request domain name, subdomain name and access path in the user access request into the same type of access requests; on the basis of resource type of each class of access requests, dividing a core resource access request; on the basis of the request time continuity, the request content continuity, the access duration and the access frequency of the core resource access request of the access user, carrying out crawler access judgment; and processing the user who is judged as crawler access. By use of the method, through various means, the crawler behavior of the user is effectively identified, implementation cost is low, a coupling degree with website businesses is low, and transportability is high. The method is independent of website business service logic, is easy in updating and upgrading and is low in operation and maintenance cost and technical risks.

Description

Technical field [0001] The invention relates to the Internet field, in particular to a method and device for identifying a crawler of a website server. Background technique [0002] Internet applications are currently prosperous. Web crawlers will occupy the valuable bandwidth and computing resources of the crawled website servers. In addition, with the rise of big data technology, website data and content resources have increasingly become the core assets of website service providers. How to effectively identify the data crawling behavior of crawler robot programs from a large number of ordinary user access requests has become one of the main technical problems that major websites urgently need to solve. [0003] At present, there are anti-crawler work, and it is often that each website developer fights separately and builds the anti-crawler into the business server according to their own business characteristics. However, on the one hand, it increases the technical and business...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 夏珺峥乔宏利
Owner 成都优易数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products