Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and a device for detecting the running state of a network crawler

A web crawler and running status technology, applied in the Internet field, can solve problems such as inaccuracy, incomplete data results, and inability to solve effective crawling work, so as to improve work efficiency and ensure integrity

Active Publication Date: 2019-02-01
BEIJING GRIDSUM TECH CO LTD
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, through the existing methods, it is impossible to solve the problem of continuing to effectively crawl the content of the website after the crawler is banned, so that the final crawled data results are incomplete, resulting in missing and insufficient website data crawled by the web crawler. precise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a device for detecting the running state of a network crawler
  • A method and a device for detecting the running state of a network crawler
  • A method and a device for detecting the running state of a network crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0073] Embodiments of the present invention provide a method for detecting the running status of web crawlers, such as figure 1 As shown, the method is to detect whether the operation of the web crawler is disabled by the website in real time, and to replace the crawling strategy in time to ensure the integrity and accuracy of the crawled network data. The embodiment of the present invention provides the following specific steps:

[0074] 101. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for detecting the running state of a network crawler, relating to the technical field of the Internet, which can detect the running state of the network crawler in real time and ensure the integrity and accuracy of the crawling network data. The main technical scheme of the invention is as follows: whether the current web page crawled by the network crawler is abnormal or not is judged; if yes, according to the web page address information of the comparison page in the preset comparison gallery of the web site corresponding to the current web page, crawling first page content information of the comparison page, wherein the preset comparison library is used for storing comparison pages set by respective websites, and the comparison page includes page address information of the comparison page and second page content information of the comparison page; according to the crawling result of the page content information of the comparison page, it is determined whether the web site corresponding to the current web page disables the web crawler. The invention is mainly used for network crawler to crawl network data.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and a device for detecting the running state of a web crawler. Background technique [0002] With the advent of the era of big data, the importance of information data is self-evident. People can grab the resource content of different websites through web crawlers, and then integrate them into network information databases for scientific research in various technical fields. Among them, web crawlers are also called web spiders, web robots, or web page chasers. They generally adopt breadth-first strategies and depth-first strategies to automatically grab programs or scripts for information on the World Wide Web. For example, they are used in search engines to crawl web data. in the process of. [0003] At present, in the process of crawling network data by web crawlers, due to the crawling speed is too fast, for example, excessive visits to websites within one minut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F11/30
CPCG06F11/302G06F11/3055
Inventor 孙德彬
Owner BEIJING GRIDSUM TECH CO LTD