Method for importing data into multiple Hadoop components simultaneously

A component and data technology, which is applied in the field of rapid transfer and processing of large amounts of data, can solve problems such as not being provided, achieve wide application prospects, highlight substantive features, and simple structure

Active Publication Date: 2017-07-04
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Sometimes it is necessary to import data from a relational database to Kafka. However, as a data transfer tool, Sqoop does not provide support for this. The same batch of data may be used by multiple tasks, while the original Sqoop only supports one task at a time. If you want to export to multiple Hadoop components, you need to write commands separately, and more importantly, you need to read the same batch of data multiple times

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for importing data into multiple Hadoop components simultaneously

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. The following embodiments are explanations of the present invention, but the present invention is not limited to the following embodiments.

[0020] Such as figure 1 As shown, a method for simultaneously importing data into multiple Hadoop components provided by this embodiment includes the following steps:

[0021] Step 1: Extend the import tool of Sqoop and add the import service to Kafka;

[0022] Step 2: Import the configuration parameters of each component according to the database, and write a parameter verification program;

[0023] Step 3: Expand the import tool of Sqoop and add the service of simultaneously exporting HDFS, Hive, Hbase, and Kafka.

[0024] The implementation process of step 1 includes: modifying the BaseSqoopTool class code and ImportTool class code of Sqoop, designing the MapReduce task to import data to Kafka, and def...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for importing data into multiple Hadoop components simultaneously, and is characterized in that the method comprises the following steps of (1) extending an import tool of the Sqoop, and adding service for importing data to the Kafka; (2) importing configuration parameters of the components according to a database, and writing parameter verification program; and (3) extending the import tool of the Sqoop, and adding service for simultaneously exporting the data to the HDFS, Hive, Hbase and Kafka. On basis of original functions of connecting the database and reading the data of the Sqoop, the function of simultaneously exporting the data to the multiple components is added, the database data are read for one time, multiple user-specified export modules are started simultaneously, and efficient and convenient data import is achieved. On one hand, multiple export tasks are prevented from being written for the same data, on the other hand, the same data are prevented from being repeatedly read, and therefore efficiency is improved.

Description

technical field [0001] The invention belongs to the technical field of rapid transfer processing of large amounts of data, and in particular relates to a method for simultaneously importing data into multiple Hadoop components. Background technique [0002] With the rapid development of society today, all walks of life generate a large amount of data every day. The data sources include any type of data that can be captured around us, such as websites, social media, transactional business data, and data created in other business environments. As cloud providers leverage this framework and more users move datasets between Hadoop and traditional databases, tools that can facilitate data transfer become even more important. In this environment, the Apache framework Hadoop came into being. It is an increasingly general distributed computing environment, mainly used to process big data. Apache Sqoop is a data transfer tool, mainly used for data transfer between Hadoop and traditi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/258G06F16/284
Inventor 尚平平臧勇真
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products