Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for web crawler identification

A web crawler and identification method technology, applied in the direction of network data indexing, network data retrieval, instruments, etc., can solve the problems of low reliability and accuracy, affecting the fluency of normal users' web browsing, etc., to achieve resource consumption and high reliability , the effect of reducing the frequency

Active Publication Date: 2017-01-04
ALIBABA GRP HLDG LTD
View PDF13 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the existing technology for identifying web crawlers has problems of low reliability and accuracy, and affects the fluency of normal users browsing the web.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for web crawler identification
  • Method and device for web crawler identification
  • Method and device for web crawler identification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are only for explaining the present application, and should not be construed as limiting the present application. On the contrary, the embodiments of the present application include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.

[0033] figure 1 It is a flowchart of an embodiment of the web crawler identification method of the present application, such as figure 1 As shown, the web crawler identification method may include:

[0034] Step 101, receiving the picture of the above-mentioned webpage and the URL of the above-mentioned webpage sent by the client after the rendering of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The application discloses a method and device for web crawler identification. The method for the web crawler identification comprises the steps that a picture of a web page and URL of the web page sent by a client side after completion of web pager rendering are received; a sample picture is obtained according to the URL; and whether the client side is a web crawler is identified according to comparison between a similarity degree and a preset threshold, wherein the similarity degree is the similarity degree between the picture of the web page and the sample picture. According to the application, reliability is high in the web page crawler identification; fluency of web page browsing by a normal user is not influenced; and even if the web crawler cracks the identification method, many resources of web crawlers can be consumed at the same time, so that frequency for the web crawlers to access web pages can be reduced.

Description

technical field [0001] The present application relates to the technical field of the Internet, in particular to a method and device for identifying web crawlers. Background technique [0002] At present, there are both web browsing by normal users through clients such as browsers and web crawlers to access web pages. Among them, a web crawler is a computer program that automatically crawls web pages. [0003] Since the web crawler does not need to render the page, it only needs to obtain the content of the file and the uniform resource locator (Uniform Resource Locator; hereinafter referred to as: URL) in the file, so the web crawler can access the web server with a very high frequency, thereby It affects normal users' access to webpages, and even some webpages do not want to be crawled by crawlers. Therefore, it is necessary to identify whether the current visit to the webpage is a crawler or a normal user, so as to prevent crawlers from accessing or reduce the crawler's v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 周高明
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products