Unlock instant, AI-driven research and patent intelligence for your innovation.

Web crawler and data transfer technology-based data acquisition system and method

A technology of data acquisition system and web crawler, applied in network data retrieval, network data index, database management system, etc., can solve the problems of inconvenient management, inability to flexibly adapt to business needs, low efficiency, etc., achieve unified scheduling and management, improve Develop and use the effects of efficient, flexible integration

Inactive Publication Date: 2018-06-08
SICHUAN JIUZHOU ELECTRIC GROUP
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the technical problems in the prior art that cannot flexibly adapt to changes in business requirements, inconvenient management, and low efficiency, the present invention proposes a data collection system and method based on web crawler and data transfer technology, providing users with an integrated multi- An online and offline data acquisition system of crawler components and data transfer components, which solves technical problems such as integration, coordination and unified calling of web crawlers and other tools

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler and data transfer technology-based data acquisition system and method
  • Web crawler and data transfer technology-based data acquisition system and method
  • Web crawler and data transfer technology-based data acquisition system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] The invention mainly aims at online network data and large-scale offline database data, and proposes a data acquisition system based on network crawler and data transfer technology. Such as figure 1 As shown, the system integrates crawler component 1, crawler component 2 ... crawler component n as a unified online collection module (n>=1), and integrates a data transfer tool (such as: Sqoop tool) as an offline data collection module. The operator provides data input to the upper layer through the collection business interface, respectively through the log collection system (such as: Flume), distributed publish and subscribe message system (such as: Kafka), distributed file system (such as: HDFS) and data warehouse system (such as: ETL) for further processing, and finally unified data support for the back-end system. In addition, the above collection services are managed and dispatched uniformly by the task management module of the system. The task management module co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web crawler and data transfer technology-based data acquisition system and method. The system integrates a plurality of crawler components and data transfer components to carry out online and offline data acquisition. According to the system and method, functions such as flexibly integration, automatic selection, uniform scheduling and management for each web crawler component are realized for different data acquisition tasks, so that users can acquire required service data via a one-stop manner. According to the system and method, the heavy repeated work caused by configuring and developing different crawler software and other acquisition systems by developers is effectively avoided, and the development efficiency and using efficiency of the developers are improved.

Description

technical field [0001] The invention relates to the field of computer data collection and processing, in particular to a data collection system and method based on web crawler and data transfer technology. Background technique [0002] With the advent of the era of big data, data acquisition has once again attracted people's attention, and web crawler technology, as an important and main means of network data acquisition, has also received more and more research. [0003] The basic workflow of the web crawler is as follows: 1. First select some carefully selected seed URLs; 2. Put these URLs into the queue of URLs to be crawled; 3. Take out the URLs to be crawled from the queue of URLs to be crawled, and resolve DNS , and obtain the IP of the host, download the web page corresponding to the URL, and store it in the downloaded web page library. Also, put these URLs into the queue of crawled URLs. 4. Analyze the URLs in the queue of URLs that have been captured, analyze othe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/25G06F16/951
Inventor 杨岸桢李东旭吴新勇邱吉刚
Owner SICHUAN JIUZHOU ELECTRIC GROUP