Big data acquisition and governance quick retrieval system based on data lake

A data acquisition and big data technology, applied in the field of big data, can solve problems such as low retrieval efficiency, resource waste, and huge storage space, and achieve the effect of avoiding huge storage resources and avoiding fragmentation

Inactive Publication Date: 2020-07-28
TIANJIN 712 COMM & BROADCASTING CO LTD
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there are still some problems and challenges in the existing data lake architecture, such as: different data sources cause data fragmentation and low retrieval efficiency; and a lot of data in the data lake will never be deleted, which requires a huge storage space, resulting in resource Waste and other issues
[0005] For example, the Chinese patents of the existing technology: a data governance method and device based on data lake, patent application number: 201910570011.6; a data lake system open and shared for all data forms, patent application number: 201810963494.1; industrial data lake system, Patent application number: 201910944246.7; the above three patents only describe the implementation of a data system based on the idea of ​​a data lake, and none of the patents involves how to avoid huge data storage resources, waste of storage resources, fragmentation of data resources, and retrieval efficiency low level problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data acquisition and governance quick retrieval system based on data lake
  • Big data acquisition and governance quick retrieval system based on data lake
  • Big data acquisition and governance quick retrieval system based on data lake

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.

[0040] The present invention will be described in detail below with reference to the accompanying drawings and examples.

[0041] The fast retrieval system for big data collection and management based on the data lake in this embodiment, such as figure 1 As shown, it includes data collection front-end module, data association metadata extraction module, data resource pool module, data governance module, data service module and data business module, covering data collection, data storage, data governance, data relationship management, data fast The processing and monitoring process of the whole life cycle of data such as query retrieval and data application service;

[0042] Data acquisition front-end module:

[0043] Complete the collection of multi-source heterogeneous data, and store the collected data i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a big data acquisition and governance quick retrieval system based on a data lake. The big data acquisition and governance quick retrieval system comprises a data acquisition front-end module, a data association metadata extraction module, a data resource pool module, a data governance module, a data service module and a data business module; the data association metadata extraction module is used for completing metadata extraction of multi-source heterogeneous data and storing extracted information in a distributed full-text retrieval database of the data service module; and the data service module has a function of automatically clearing cold data. According to the data lake-based data acquisition and governance quick retrieval system provided by the invention, thecharacteristics of a big data technology, a data governance technology and different types of databases are fully utilized, so that the problems of fragmentation of data resources and low data retrieval efficiency can be effectively avoided, and the problems of huge data storage resources and waste of storage resources are also avoided.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a fast retrieval system for big data collection and management based on a data lake. Background technique [0002] The so-called data lake is to classify and store raw data into different data pools, and then integrate and transform the data into a unified storage format that is easy to analyze in each data pool for storage. This method greatly facilitates users to analyze and utilize data, thereby generating economic benefits. [0003] The common way to realize the data lake is to use Hadoop technology. The data lake stores the original data according to the category, and the data can be converted into a unified format that can be directly extracted in each data pool. This method has great commercial value. Big data analysis has made a great contribution. [0004] However, there are still some problems and challenges in the existing data lake architecture, such as...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/901G06F16/903
CPCG06F16/901G06F16/90335
Inventor 李光李延波张建军俞光日夏连杰刘金栋李延勇
Owner TIANJIN 712 COMM & BROADCASTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products