Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed Crawler System

A crawler system, distributed technology, applied in the field of distributed crawler systems

Active Publication Date: 2019-03-26
HENAN INST OF ENG
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method does not organize and classify the information of web page data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed Crawler System
  • Distributed Crawler System
  • Distributed Crawler System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0018] 1. Operating environment

[0019] Operating system: Linux system environment

[0020] Python version: python2.7.12

[0021] Server program: Apache2.4

[0022] Database: redis non-relational row database

[0023] Character encoding: uft-8

[0024] 2. System module design:

[0025] The functional modules of the distributed crawler system are divided as follows figure 1 shown.

[0026] The whole system of distributed crawler is divided into two main functional modules of distributed crawling and automatic web page structure.

[0027] The distributed crawling module is composed of the crawler core module, the crawling rule module and the task management module. The crawler core module is responsible for the specific execution of tasks, including downloading web pages, parsing data according to rul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed crawler system, which comprises a distributed crawling module and a web page automatic structuring module. The distributed climbing module is composed of a crawler core module, a climbing rule module and a task management module. The web page automatic structuring module is composed of a web page structuring module and a template training module.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a distributed crawler system. Background technique [0002] The Internet is a channel for companies to publish information, a tool for individuals to share and obtain information, and it also provides the government with a large amount of valuable information for monitoring companies and individuals. The government can effectively use the information of the Internet to discover the trend of public opinion, establish a credit reporting system, and discover criminal behavior. However, these valuable information are scattered in every corner of the Internet, which prevents us from using them effectively. [0003] The crawler system is a system that collects a large amount of scattered Internet data, and is the basis of the search engine system. Big data has developed rapidly in recent years and is hot, not only because of the large capacity of data, but also because ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
CPCY02D10/00
Inventor 程浩王慧娜田大钊马士振陈旭升何园园
Owner HENAN INST OF ENG