Supercharge Your Innovation With Domain-Expert AI Agents!

Single-computer crawler grabbing method and system

A crawler and stand-alone technology, applied in special data processing applications, network data retrieval, instruments, etc., can solve problems such as low work efficiency, short crawling time, and inability to crawl multiple types of websites at the same time, so as to improve work efficiency , the effect of extending the crawling time

Active Publication Date: 2014-12-31
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Based on this, it is necessary to provide a stand-alone crawler crawling method for the technical problems that the existing stand-alone web crawler crawling mechanism of the prior art has low working efficiency, short crawling time, and cannot crawl multiple types of websites at the same time and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single-computer crawler grabbing method and system
  • Single-computer crawler grabbing method and system
  • Single-computer crawler grabbing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0020] Such as figure 1 Shown is a work flow diagram of a single-machine crawler crawling method of the present invention, including:

[0021] Step 11, obtaining at least one seed including URL, website number and type, using the URL of the seed as the current URL, using the website number of the seed as the current website number, and using the type of the seed as the current type;

[0022] Step 12, obtaining at least one strategy, and determining at least one crawler crawling parameter according to the strategy;

[0023] Step 13, acquiring a rule corresponding to the current type according to the current type;

[0024] Step 14, crawl webpage data from the current URL according to the crawler crawling parameters, and analyze the webpage data according to the rules to obtain parsed data.

[0025] The strategy in step 12 is used to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a single-computer crawler grabbing method and system. The single-computer crawler grabbing method includes acquiring at least one seed including a URL (uniform resource locator), a website number and a type, taking the URLs of the seeds as current URLs, taking the website numbers of the seeds as current website numbers, and taking the types of the seeds as current types; acquiring at least one strategy, and determining at least one crawler grabbing parameter according to the strategies; acquiring rules corresponding to the current types according to the current types; grabbing website data from the current URLs according to the crawler grabbing parameters, and analyzing the website data according to the rules to acquire analysis data. The crawler grabbing parameters are determined through the strategies so as to solve the problems in the process of grabbing, so that working efficiency is improved, grabbing time is increased, and the single-computer crawler grabbing method and system is suitable for websites of various types.

Description

technical field [0001] The invention relates to technologies related to web crawlers, in particular to a single-machine crawler capture method and system. Background technique [0002] The Internet has massive amounts of data and information. How to convert these data and information into what you want, and then analyze and process it is a tricky thing. The emergence of web crawlers solves all these problems. [0003] At present, most crawler devices simply implement the function of crawling web pages, but they do not have good performance in terms of repeated crawling, trapping in endless loops, formulating anti-crawling strategies (extending crawling time), etc. In addition, the current stand-alone network compatibility is not good, and it cannot solve the crawling needs of multiple websites at the same time. Contents of the invention [0004] Based on this, it is necessary to provide a stand-alone crawler crawling method for the technical problems that the existing st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/9535G06F16/9566
Inventor 廖耀华
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More