Web crawler and data transfer technology-based data acquisition system and method
A technology of data acquisition system and web crawler, applied in network data retrieval, network data index, database management system, etc., can solve the problems of inconvenient management, inability to flexibly adapt to business needs, low efficiency, etc., achieve unified scheduling and management, improve Develop and use the effects of efficient, flexible integration
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0034] The invention mainly aims at online network data and large-scale offline database data, and proposes a data acquisition system based on network crawler and data transfer technology. Such as figure 1 As shown, the system integrates crawler component 1, crawler component 2 ... crawler component n as a unified online collection module (n>=1), and integrates a data transfer tool (such as: Sqoop tool) as an offline data collection module. The operator provides data input to the upper layer through the collection business interface, respectively through the log collection system (such as: Flume), distributed publish and subscribe message system (such as: Kafka), distributed file system (such as: HDFS) and data warehouse system (such as: ETL) for further processing, and finally unified data support for the back-end system. In addition, the above collection services are managed and dispatched uniformly by the task management module of the system. The task management module co...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


