Distributed network crawler system and catching method thereof

A technology of distributed network and crawler system, applied in the field of distributed network crawler system and its crawling, can solve the problems of different resource contents and different types of crawling data, etc.

Inactive Publication Date: 2013-04-10
人民搜索网络股份公司
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Since different vertical channel products have different types of captured data and different sizes of resource content, and each vertical channel product prefers to handle its own capture tasks independently without being interfered by other businesses, this requires The crawling and processing tasks of each vertical channel are independent, but different vertical channels may have access requirements for the same site
Therefore, it will cause a contradiction between the overall bandwidth resources and multiple vertical channels within the search engine

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed network crawler system and catching method thereof
  • Distributed network crawler system and catching method thereof
  • Distributed network crawler system and catching method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The distributed crawler system and its crawling method of the present invention will be further described in detail below in conjunction with the accompanying drawings and the embodiments of the present invention.

[0028] figure 1 It is a schematic diagram of the module relationship of the distributed web crawler system of the present invention, such as figure 1 As shown, the system specifically includes a parameterized control capture module, several vertical channel capture customization modules, a unified capture scheduling module, a general storage and calculation module for capture results, and a capture result distribution module. in:

[0029] The parameterized control capture module, in order to make better use of resources such as bandwidth and connections, parametrically configures the resources to be captured according to the fields such as the content to be captured, the type to be captured, and the UserAgent used. Each capture unit (which can understood a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed network crawler system and a catching method of the distributed network crawler system. The distributed network crawler system and the catching method of the distributed network crawler system comprise a parameterized control catching module for conducting parameterized allocation to resources to be caught according to catching contents, catching types and user agent in use; a vertical channel catch custom module for managing and appointing catching behaviors of each vertical channel and have the function of statistics; a unitary catch scheduling module for merging catch requests of all the vertical channels and scheduling and catching in a unified mode according to polite control and pressure condition to sites of the opposite side; general storage and calculation module for using the caught general storage request and calculation request of different vertical channels through arrangement; a catch result distributing module for sending results to the appointed place according to appointment of the catch behaviors of each vertical channel catch custom module. The distributed network crawler system and the catching method of the distributed network crawler system can achieve multi-channel catch, and effectively use the present whole broadband and other resources to serve various sub-channels inside a search engine better.

Description

technical field [0001] The invention relates to search engine technology, in particular to a distributed web crawler system and a grabbing method thereof, which can be used in a search engine web crawler module. Background technique [0002] Many current search engines cover a variety of vertical search channels, and more than 90% of the data sources of each vertical channel must be actively crawled by web crawlers. [0003] Since different vertical channel products have different types of captured data and different sizes of resource content, and each vertical channel product prefers to handle its own capture tasks independently without being interfered by other businesses, this requires The crawling and processing tasks of each vertical channel are independent, but different vertical channels may have access requirements for the same site. Therefore, it will cause a contradiction between the overall bandwidth resources and multiple vertical channels within the search en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08G06F17/30
Inventor 高立闯
Owner 人民搜索网络股份公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products