Distributed web crawler system and scheduling method based on VPN

A technology of distributed network and crawler system, applied in the field of VPN-based distributed network crawler system and scheduling

Active Publication Date: 2017-10-10
南方电网互联网服务有限公司
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This poses a great challenge to the coverage a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed web crawler system and scheduling method based on VPN
  • Distributed web crawler system and scheduling method based on VPN
  • Distributed web crawler system and scheduling method based on VPN

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0061] The present invention will be further described below in conjunction with the drawings and embodiments.

[0062] In order to solve the problem that most organizations have a single IP address when accessing the Internet, resulting in incomplete data crawling and poor coverage of social networking sites, the present invention proposes a VPN-based distributed network crawler system.

[0063] Such as figure 1 As shown, the deployment structure of the distributed network crawler system based on VPN of the present invention is given. The distributed network crawler system based on VPN of the present invention is deployed in an organization's local area network, and accesses the Internet through a router accessing an operator. The VPN-based distributed web crawler system consists of crawling setting client, crawling master control node, crawling node, URL index server, data center and end users.

[0064] The crawling settings client is used to configure data sources, keywords, craw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed web crawler system and scheduling method based on VPN. A remote VPN agency server is connected in a VPN dialing mode, and the VPN connection is switched to obtain different public network IPs, and the problem of single local area network IP address is solved; then, although a plurality of public network IPs can be obtained by the manner of accessing the remote VPN agency server, with respect of the updating frequency of social news websites, the IP addresses are still precious scarce resource, and in order to obtain data as much as possible for one IP address, the manner of multi-target data source URL insertion collection is adopted to avoid densely collecting the data in the single server at the same time, resulting in that the server refuses the access, and the data collection coverage and integrity of the social media network platforms are solved; and finally, different from the current network connection allocation manner in load balancing, the loads of the crawler nodes are balanced in a manner of adjusting the keywords.

Description

technical field [0001] The present invention relates to a distributed network crawler system and a scheduling method, more specifically, to a VPN-based distributed network crawler system and a scheduling method. Background technique [0002] With the advent of the era of big data, the information carried on the Internet is becoming more and more abundant. Among them are policy websites that guide the development of the industry, news websites that introduce the latest technological developments in related fields and competitors’ product information, and reflect users’ opinions on products. Evaluation blogs, forums, Weibo and other social networking sites. The effective access and application of external network data provides information support for decision-making, planning, cost management, sales operations, and after-sales services for enterprises at all levels and types, and opens a window for enterprises to better understand themselves and control the market. The web cr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/06H04L29/08G06F17/30G06Q50/00
CPCH04L63/0236H04L63/0272H04L63/0281H04L63/10H04L63/1458G06F16/9566G06Q50/01H04L67/1001
Inventor 魏墨济杨子江朱世伟李晨李宪毅杨爱芹于俊凤张铭君董婷李思思徐蓓蓓刘翠琴
Owner 南方电网互联网服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products