Method for importing multi-source heterogeneous data into data lake

A multi-source heterogeneous data and data technology, which is applied in the collection, management and application of multi-source heterogeneous data, can solve the problems of multi-source heterogeneity, and achieve the effect of reducing dumping steps, ensuring import speed, and ensuring security

Pending Publication Date: 2020-06-02
中云开源数据技术(上海)有限公司
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for importing multi-source heterogeneous data into a data lake. By connecting the external data source with the local data lake server, various types of external data can be stored in the database of the local data lake, and the data to be saved can be solved. The Multivariate Heterogeneity Problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for importing multi-source heterogeneous data into data lake

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the present invention more obvious and understandable, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0051] The method for importing multi-source heterogeneous data into a data lake of the present invention includes: obtaining access interface information of an external data source, connecting a local data lake server and an external data source, importing data from an external data source, and storing the imported data in the form of a data file Save to the distributed file system of the local data lake server. Wherein, the external data source includes an external database and an external streaming data source. Such as figure 1 As shown, the external data sources of the data lake in this embodiment may be IT data (existing data), open data (such as data from various networks) and OT data (such as data in the process of generation).

[0052] Exemplarily, obtaining the a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for importing multi-source heterogeneous data into a data lake, which comprises the following steps of: obtaining an access interface address of external file type data, importing the file type data and storing the file type data into a distributed file system of a local data lake server; or obtaining access interface information of an external data source, connecting the access interface information with a local data lake server; importing data of an external data source and storing the data in a distributed file system in a data file form; or converting the non-relational data of the external data source into relational data and then storing the relational data in a relational database, or directly importing the relational data of the external data sourceand storing the relational data in the relational database, or importing the non-relational data of the external data source and storing the non-relational data in a document database. According to the method, the problem of multivariate isomerism of the to-be-stored data can be solved, collection, management, application and expansion of the multi-source isomerism data are facilitated, various requirements of an organizational structure are met, and the data access security and the data import flexibility are ensured.

Description

technical field [0001] The invention relates to the field of collection, management and application of multi-source heterogeneous data, in particular to a method for importing multi-source heterogeneous data into a data lake. Background technique [0002] Database technology is the foundation and core of modern computer information systems and computer application systems, and is an important part of information systems. When developing a database application system, it is usually necessary to export database data for system backup or data sharing and exchange with other systems. [0003] The concept of data lake or hub was originally proposed by big data vendors. On the surface, data is carried on cheap storage hardware based on scalable HDFS (Hadoop Distributed File System). But the larger the data volume, the more different kinds of storage are required. Ultimately, all enterprise data can be considered big data, but not all enterprise data is suitable for storage on ch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/182G06F16/16G06F16/176
Inventor 陈刚
Owner 中云开源数据技术(上海)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products