Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data collection method based on flume and alluxio

A data collection and data technology, applied in cloud computing platforms and information fields, can solve problems such as increased hardware input costs and limited performance improvement of data sinking components

Active Publication Date: 2020-10-27
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Even if the data is sunk to the file system based on SSD (Solid State Drive), not only the investment cost of the hardware will be greatly increased, but also the performance improvement of the data sunk components will be limited.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data collection method based on flume and alluxio
  • A data collection method based on flume and alluxio
  • A data collection method based on flume and alluxio

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

[0048] see figure 2 , the present invention provides a data collection device 1 based on Flume and Alluxio, wherein the data collection device 1 includes: a data extraction terminal 2, an agent terminal 3, an Alluxio terminal 4, and a data storage terminal 5;

[0049] Among them, the data extraction terminal 2 is used to collect and obtain a large amount of original data units;

[0050] Among them, the agent terminal 3 is used to convert the original data unit extracted by the data extraction terminal to obtain data in a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Alluxio serves as a target storage file system for Flume collected data sinking, and a data sinking assembly flume-alluxio-sink is achieved through design. The invested cost of hardware is reduced by the flume-alluxio-sink assembly through the asynchronous write-in and hierarchical storage characteristics of Alluxio, the efficiency of data sinking is improved, and the performance of data collection of Flume is improved. Data is averagely distributed to each node in a cluster as much as possible through a configured distribution strategy, and the problem of data inclination to a certain degree is avoided.

Description

technical field [0001] The present invention relates to the field of information technology, in particular to a method and device for fast data collection based on Flume and Alluxio in the field of cloud computing platform technology. Background technique [0002] In the era of cloud computing, in the face of massive data, traditional ETL (Extraction-Transformation-Loading) tools are obviously unable to do what they want, mainly because the data conversion overhead is too high, and the performance cannot meet the collection requirements of massive data. In order to improve the performance of massive data collection, various mature and effective massive data collection components have been produced, such as the commonly used open source Flume component of the Apache Foundation. Flume is a distributed, reliable and highly available massive data aggregation system, which supports the collection of data from different types of data sources in the system, and at the same time, pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/13G06F16/172G06F16/174
CPCG06F16/13G06F16/172G06F16/1748
Inventor 苑晓龙王绍成
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products