Cache-based big data processing dimension table storage and calculation system and method thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A big data processing and data storage module technology, applied in the field of big data, can solve the problem of unsupported large data volume, high concurrent reading and writing, unfavorable technical stack unification and maintenance of code and tasks, and limited data reading and writing performance and other issues, to achieve the effect of solving the data processing delay problem, improving the data processing speed and throughput, and simply caching data reading and writing

Pending Publication Date: 2022-05-27

CLOUD WISDOM BEIJING TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Commonly used dimension table storage services in the industry include HBase, MySQL, Redis, etc. Among them, HBase and MySQL are officially supported by Flink, providing SQL-style data writing and association solutions. The advantage of MySQL as a dimension table lies in its complete data format and flexible association Semantic, but limited by the data read and write performance of MySQL, it only supports data association in the case of small data volumes, and cannot support large data volumes and high concurrent reading and writing; while HBase adopts the LSM data structure, and HDFS is used for the underlying storage. It has better performance for writing massive data, but the reading performance is not very high; Redis is a memory-based, distributed key-value pair storage database developed in C language, using memory storage to avoid low-speed disk operations, The data reading and writing of Redis is very fast, and the distributed feature allows Redis to support concurrent access to large amounts of data. However, since Redis is not officially supported by Flink, most of the industry uses Java code when using Redis as a dimension table, calling Redis API to operate data read and write, or write UDF, use Flink SQL call

Both of these solutions require Java coding and do not support code reuse, and the operation is cumbersome, which is not conducive to the unification of the technology stack and the maintenance of code and tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0032] The specific technical solutions of the present invention will be further described below with reference to the accompanying drawings, so that scholars and those skilled in the art can further understand the present invention, but do not constitute a limitation of the rights of the present invention.

[0033] The cache-based big data processing and dimension table storage and calculation method provided by the embodiment of the present invention mainly includes two steps of dimension table data writing and dimension table association calculation. The dimension table data writing can obtain data from batch or streaming data sources, The data is cleaned and converted and stored in the Redis cache service; dimension table association calculation is applied to data information completion, business data is obtained from batch or streaming data sources, and the corresponding extended data is read in the Redis cache according to the index key to complete the business. Completio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a big data processing dimension table storage and calculation system based on cache and a method thereof. The big data processing dimension table storage and calculation system comprises a data storage module, a dimension table configuration module, a Redis connection management module, a data type conversion module, a primary key processing module, a common field processing module, a data write-in module, a data association module and an Flink integration module. The data processing speed and throughput can be obviously improved through Redis, the problem of data processing delay is solved, and the problem of memory overflow caused by too large associated data can be eliminated through a Redis dimension table; and in addition, an Flink SQL mode can be provided to operate, read and write Redis data, Redis services of a single machine, a cluster, a sentry and an agent mode are packaged, a service interface and a big data development technology stack are unified, and the big data development convenience is improved.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a cache-based big data processing dimension table storage and calculation system and a method thereof. Background technique [0002] With the vigorous development of information technology, under the general trend of digital transformation in all walks of life, the data scale of enterprises is getting larger and larger, and traditional data processing technologies can no longer meet the demands of real-time processing of massive data. Thanks to hadoop, Spark With the continuous development and evolution of big data processing technologies such as , Flink, etc., data is processed with higher throughput and lower processing delay, so fast processing speed is the key to maximizing the value of extracted data. Technological iteration has also made data processing easier and more convenient. Big data development technology has developed from Java and Scala to Python and SQL. At pres...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F3/06G06F16/22G06F16/242G06F16/2455

CPCG06F3/0611G06F3/0613G06F16/2282G06F16/2433G06F16/24552Y02D10/00

Inventor 赵永振马上坤高驰涛

Owner CLOUD WISDOM BEIJING TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cache-based big data processing dimension table storage and calculation system and method thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology