Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A way to import data into multiple hadoop components simultaneously

A component and data technology, which is applied in the field of rapid transfer processing of large amounts of data, can solve the problem of non-supply, etc., and achieve the effect of broad application prospects, prominent substantive features, and simple structure

Active Publication Date: 2020-09-25
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Sometimes it is necessary to import data from a relational database to Kafka. However, as a data transfer tool, Sqoop does not provide support for this. The same batch of data may be used by multiple tasks, while the original Sqoop only supports one task at a time. If you want to export to multiple Hadoop components, you need to write commands separately, and more importantly, you need to read the same batch of data multiple times

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A way to import data into multiple hadoop components simultaneously

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be described in detail below with reference to the drawings and specific embodiments. The following embodiments are an explanation of the present invention, and the present invention is not limited to the following embodiments.

[0020] Such as figure 1 As shown, the method for simultaneously importing data into multiple Hadoop components provided in this embodiment includes the following steps:

[0021] Step 1: Extend the import tool of Sqoop and add to the import service of Kafka;

[0022] Step 2: Import the configuration parameters of each component according to the database, and write the parameter verification program;

[0023] Step 3: Extend the import tool of Sqoop to add services that export HDFS, Hive, Hbase, and Kafka at the same time.

[0024] The implementation process of step 1 includes: modify the BaseSqoopTool class code and ImportTool class code of Sqoop, design the MapReduce task for importing data to Kafka, and define the configuratio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for importing data into multiple Hadoop components simultaneously, and is characterized in that the method comprises the following steps of (1) extending an import tool of the Sqoop, and adding service for importing data to the Kafka; (2) importing configuration parameters of the components according to a database, and writing parameter verification program; and (3) extending the import tool of the Sqoop, and adding service for simultaneously exporting the data to the HDFS, Hive, Hbase and Kafka. On basis of original functions of connecting the database and reading the data of the Sqoop, the function of simultaneously exporting the data to the multiple components is added, the database data are read for one time, multiple user-specified export modules are started simultaneously, and efficient and convenient data import is achieved. On one hand, multiple export tasks are prevented from being written for the same data, on the other hand, the same data are prevented from being repeatedly read, and therefore efficiency is improved.

Description

Technical field [0001] The invention belongs to the technical field of rapid transfer processing of large amounts of data, and specifically relates to a method for simultaneously importing data into multiple Hadoop components. Background technique [0002] With the rapid development of society today, various industries generate a large amount of data every day. The data sources include any type of data that can be captured around us, websites, social media, transactional business data, and data created in other business environments. As cloud providers use this framework, more users transfer data sets between Hadoop and traditional databases, and tools that can help data transfer become more important. In this environment, the Apache framework Hadoop came into being, it is an increasingly common distributed computing environment, mainly used to process big data. Apache Sqoop is a data transfer tool, which is mainly used to transfer data between Hadoop and traditional databases. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27
CPCG06F16/258G06F16/284
Inventor 尚平平臧勇真
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products