Unlock instant, AI-driven research and patent intelligence for your innovation.

Cache-based big data processing dimension table storage and calculation system and method thereof

A big data processing and data storage module technology, applied in the field of big data, can solve the problem of unsupported large data volume, high concurrent reading and writing, unfavorable technical stack unification and maintenance of code and tasks, and limited data reading and writing performance and other issues, to achieve the effect of solving the data processing delay problem, improving the data processing speed and throughput, and simply caching data reading and writing

Pending Publication Date: 2022-05-27
CLOUD WISDOM BEIJING TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Commonly used dimension table storage services in the industry include HBase, MySQL, Redis, etc. Among them, HBase and MySQL are officially supported by Flink, providing SQL-style data writing and association solutions. The advantage of MySQL as a dimension table lies in its complete data format and flexible association Semantic, but limited by the data read and write performance of MySQL, it only supports data association in the case of small data volumes, and cannot support large data volumes and high concurrent reading and writing; while HBase adopts the LSM data structure, and HDFS is used for the underlying storage. It has better performance for writing massive data, but the reading performance is not very high; Redis is a memory-based, distributed key-value pair storage database developed in C language, using memory storage to avoid low-speed disk operations, The data reading and writing of Redis is very fast, and the distributed feature allows Redis to support concurrent access to large amounts of data. However, since Redis is not officially supported by Flink, most of the industry uses Java code when using Redis as a dimension table, calling Redis API to operate data read and write, or write UDF, use Flink SQL call
Both of these solutions require Java coding and do not support code reuse, and the operation is cumbersome, which is not conducive to the unification of the technology stack and the maintenance of code and tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cache-based big data processing dimension table storage and calculation system and method thereof
  • Cache-based big data processing dimension table storage and calculation system and method thereof
  • Cache-based big data processing dimension table storage and calculation system and method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The specific technical solutions of the present invention will be further described below with reference to the accompanying drawings, so that scholars and those skilled in the art can further understand the present invention, but do not constitute a limitation of the rights of the present invention.

[0033] The cache-based big data processing and dimension table storage and calculation method provided by the embodiment of the present invention mainly includes two steps of dimension table data writing and dimension table association calculation. The dimension table data writing can obtain data from batch or streaming data sources, The data is cleaned and converted and stored in the Redis cache service; dimension table association calculation is applied to data information completion, business data is obtained from batch or streaming data sources, and the corresponding extended data is read in the Redis cache according to the index key to complete the business. Completio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a big data processing dimension table storage and calculation system based on cache and a method thereof. The big data processing dimension table storage and calculation system comprises a data storage module, a dimension table configuration module, a Redis connection management module, a data type conversion module, a primary key processing module, a common field processing module, a data write-in module, a data association module and an Flink integration module. The data processing speed and throughput can be obviously improved through Redis, the problem of data processing delay is solved, and the problem of memory overflow caused by too large associated data can be eliminated through a Redis dimension table; and in addition, an Flink SQL mode can be provided to operate, read and write Redis data, Redis services of a single machine, a cluster, a sentry and an agent mode are packaged, a service interface and a big data development technology stack are unified, and the big data development convenience is improved.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a cache-based big data processing dimension table storage and calculation system and a method thereof. Background technique [0002] With the vigorous development of information technology, under the general trend of digital transformation in all walks of life, the data scale of enterprises is getting larger and larger, and traditional data processing technologies can no longer meet the demands of real-time processing of massive data. Thanks to hadoop, Spark With the continuous development and evolution of big data processing technologies such as , Flink, etc., data is processed with higher throughput and lower processing delay, so fast processing speed is the key to maximizing the value of extracted data. Technological iteration has also made data processing easier and more convenient. Big data development technology has developed from Java and Scala to Python and SQL. At pres...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F3/06G06F16/22G06F16/242G06F16/2455
CPCG06F3/0611G06F3/0613G06F16/2282G06F16/2433G06F16/24552Y02D10/00
Inventor 赵永振马上坤高驰涛
Owner CLOUD WISDOM BEIJING TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More