Autonomous data lake construction system and method based on associated data

A technology for associating data and data, applied in database management systems, relational databases, database indexes, etc., to improve semantic richness and utilization, ensure real-time and integrity, and improve processing and utilization.

Active Publication Date: 2020-03-31
南京润辰科技有限公司
View PDF12 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, how to effectively automate the fusion of continuous heterogeneous data sources, deeply analyze the instance data in the data lake, automatically establish internal semantic ass

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Autonomous data lake construction system and method based on associated data
  • Autonomous data lake construction system and method based on associated data
  • Autonomous data lake construction system and method based on associated data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Such as figure 2 As shown, the implementation environment of this embodiment includes: external user interface, data import interface, data persistence layer, data encapsulation export interface and such as figure 1 The shown embodiment relates to an autonomous data lake construction system based on linked data, wherein: the external user interface is used to provide a visual operation interface for the autonomous data lake construction system, and external users can operate intuitively and conveniently through this page, including: Import, delete, and retrieve various types of data; the data import interface is used to receive data import requests from external users and / or various heterogeneous data sources to the autonomous data lake construction system, including: structured relational databases, semi-structured JSON files and unstructured table scan pictures, the data encapsulation export interface is used to provide the autonomous data lake construction system wi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an autonomous data lake construction system and method based on associated data. The system comprises a data source input module, a heterogeneous data preprocessing module, a metadata discovery and extraction module, a metadata fusion and association module, a meta-model optimization and construction module, an instance knowledge extraction module, a knowledge packaging module, a knowledge correction and fusion module, an instance concept extraction module and a meta-model verification and evolution module. Based on the associated data, the directory index updated in real time and the instance knowledge graph capable of being quickly positioned through the directory are generated while the data lake is constructed, and the data lake with the autonomous ability is finally obtained through the internal structure and semantic association of the directory index and the instance knowledge graph, so that the data lake is easily managed and retrieved by external usersand more requirements are met.

Description

technical field [0001] The present invention relates to a technology in the field of big data information processing, in particular to a multi-source heterogeneous data-based autonomous data lake construction system and method based on associated data. Background technique [0002] In the era of big data, all data has potential value. A data lake is a centralized repository that allows all structured, semi-structured and unstructured data to be stored at any scale. Data lakes do not require pre-defined data structures and can be stored in raw form. After a long period of accumulation, a data lake without governance will become a "data swamp" that no one can clean up, and the data becomes difficult to understand and use. Therefore, how to build a data lake that can fully automatically integrate data, update the data lake model and directory in real time, and facilitate external management and use, making it an autonomous data lake becomes very important. At present, there ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/22G06F16/2457G06F16/25G06F16/28
CPCG06F16/22G06F16/24573G06F16/25G06F16/288
Inventor 蔡鸿明黄佳卉张贝格于晗雷连松姜丽红
Owner 南京润辰科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products