Unlock instant, AI-driven research and patent intelligence for your innovation.

Web page crawler cooperating method

A web crawler and crawler technology, which is applied in the field of information network, can solve the problems of mass storage, expensive web pages, collection bandwidth consumption, etc., and achieve the effect of saving bandwidth costs, saving storage costs, and solving time-sustainable problems

Inactive Publication Date: 2014-09-10
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The above-mentioned cluster collection method faces the problems of massive consumption of collection bandwidth, funds, and massive storage of web pages, and P2P collection faces the sustainable problem of altruistic use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page crawler cooperating method
  • Web page crawler cooperating method
  • Web page crawler cooperating method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] refer to Figure 4 A schematic diagram of a webpage crawler collaborative collection system, the webpage crawler collaboration method provided in this embodiment includes the following steps:

[0036] 1) After the computing device goes online, it registers with the management server; after that, at regular intervals (such as 30 seconds), the management server polls each computing device to check the online status;

[0037] 2) The management server is divided into several collection groups according to the information of the computing device (such as the network where it is located and the online history) (for example, taking a week (7 days) as the cycle, the number of collection groups=168 / the online time length of the collection group);

[0038] 3) The management server sends the information of each collection group to the computing device, and each computing device forms a network based on the information;

[0039] 4) Each collection group is responsible for the coll...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page crawler cooperating method which comprises the following steps: crawler nodes form a number of collecting groups according to online time periods, and all the collecting groups can realize one period of continuous online; then web pages are collected between the collecting groups by an information exchange method; and finally, all the collecting pages cooperate to store the collected web pages. Each collecting group obtains an ID number corresponding to the collecting group in an automatic generating or configuring manner. The information exchange method comprises the following steps: each collecting group forms a routing network, and the nodes transmit signaling or information to another collecting group according to a routing information table, wherein a routing protocol in IP network routing or various DHT (Distributed Hash Table) protocols in a peer-to-peer network can be adopted as the routing protocol in the routing network, or a center is adopted for controlling the information exchange method. According to the web crawler cooperating method, bandwidth problem caused by the centralization of a collecting device and mass storage problem of web pages can be solved, and the time sustainability problem of P2P collection is also solved.

Description

technical field [0001] The invention relates to the technical field of information network, in particular to a webpage crawler cooperation method. Background technique [0002] Nowadays, Internet search engines have become an indispensable tool in people's daily life, such as Google, Baidu, Sogou, People's Search, etc. Searching for information, learning, troubleshooting, advertising, etc., search engine business has penetrated into every aspect of life. In a search engine, the crucial step is the collection and acquisition of relevant information, which is typically reflected in the collection of web pages on the Internet. [0003] The collection of web pages has gone through several stages. The first stage is a single-host collection stage, which is a typical central processing method; the second stage is a cluster collection stage in which multiple hosts cooperate, which is characterized by the fact that these hosts are in an IDC computer room or a high-speed interconne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/08H04L29/06G06F17/30
Inventor 王劲林王玲芳邓峰齐向东
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI