Volunteer computing based multi-tenant professional cloud crawler

A multi-tenant, volunteer technology, applied in computing, special data processing applications, network data retrieval, etc., can solve problems such as bandwidth and idle server resources

Active Publication Date: 2016-03-30
HANGZHOU ABMATRIX TECH CO LTD
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The problem of Internet information collection is a relatively common demand. Internet companies as large as Baidu, Tencent, and Ali, etc., and companies as small as companies that develop weather forecast apps all have the need to crawl data. At present, the solutions for Internet information collection a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Volunteer computing based multi-tenant professional cloud crawler
  • Volunteer computing based multi-tenant professional cloud crawler
  • Volunteer computing based multi-tenant professional cloud crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings, but the present invention is not limited to these embodiments.

[0076] The crawler platform of the present invention adopts a distributed system structure based on volunteer computing, and is composed of a crawler server, a crawler collection client, and a user management client. figure 1 . That is, the present invention consists of the following parts:

[0077] 1. User management client

[0078] The user management client is the portal for user management, providing users with WEB interface and RESTAPI services. Through the management portal, users can define crawler tasks, submit crawler tasks, set crawler parameters, check crawler running status and obtain crawled data. The user submits a collection task to the platform, and the task is scheduled to run and return the result as attached figure 2 As shown, the steps are:

[0079] 1) Fir...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to the field of network information acquisition, and provides a volunteer computing based multi-tenant professional cloud crawler. The crawler comprises a user management client for defining a crawler task, submitting the crawler task, setting a crawler parameter, checking a crawler running condition and acquiring crawled data, and further comprises a crawler server end for implementing scheduling of the crawler task and processing of the crawled data, and a crawler acquisition client for acquiring internet site data and collecting information of a network bandwidth rate of a running host; the crawler server end comprises distributed scheduling and distributed processing, wherein the distributed scheduling achieves scheduling of the crawler task, management of crawler client resources and reception of data returned from a crawler client; and for the distributed processing, a distributed data processor consumes data in a message queue in real time and processes data in a streaming manner. The crawler provided by the present invention improves the utilization rate of user idle resources, saves the cost for developing a crawler system by a user, and realizes fair sharing of resources.

Description

technical field [0001] The invention relates to the field of network information collection, in particular to a multi-tenant professional cloud crawler based on volunteer computing. Background technique [0002] Now we have entered an era of data explosion. With the development of Internet and mobile Internet technology, the Web has become a platform for data sharing. Then, how to let people find the information they need in the massive data will become more and more difficult. It's getting harder. [0003] Under such circumstances, general search engines (Google, Bing, Baidu, etc.) become the best way for everyone to quickly find target information. When users are relatively clear about their needs, it is very convenient to use a general search engine to quickly find the information they need through keyword searches. However, general search engines cannot fully meet the needs of users for information discovery. That is because in many cases, firstly, general search engin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 徐精忠刘凯枫
Owner HANGZHOU ABMATRIX TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products