Distributed data flow hierarchical cache automatic migration algorithm based on Kafka

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An automatic migration and data flow technology, applied in the field of big data storage, can solve the problems that the Kafka system does not support hierarchical storage, etc., and achieve the effects of improving stream processing performance, reducing storage costs, and expanding functions

Active Publication Date: 2020-12-08

NORTHEASTERN UNIV

View PDF17 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Aiming at the problem that the Kafka system does not support hierarchical storage, the present invention proposes an automatic data migration algorithm (HHF-Migrate) combining access heat and migration frequency, which calculates the access heat and migration frequency of TopicPartition by counting the hot data information of the log , automatically migrate cold data with low heat and low frequency from SSD to HDD, and hot data with high heat and high frequency in HDD will be automatically migrated to SSD, thus realizing hierarchical caching of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0034] This embodiment is carried out in a cluster environment, the cluster includes three nodes, the software environment is Ubuntu16.04 system, the programming language is Java / scala, and the hierarchical storage system of each node is based on Samsung solid-state hard disk SSD (250GB) and Seagate mechanical hard disk HDD (1TB) build. The corresponding working parameters of the cluster are as follows: the copy coefficient of the topic is 2, the number of brokers (servers) is 3, the partition coefficient is also 3, and the number of producers and consumers is 6. Producers publish messages to the hierarchical cache system, and consumers read messages from the cache system. When reading and writing the log of the TopicPartition in the Kafka cluster, such as figure 1 As shown in the log module, use the read() and append() functions.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the field of big data storage, and relates to a distributed data flow hierarchical cache automatic migration algorithm based on Kafka. According to characteristics of Kafka data access, a hot data storage structure is designed, so that not only is the storage space reduced, but also metadata information of hot data can be managed according to the structure. According to the structure of the hot data, a data automatic migration algorithm (HHFMigrate combining access heat and migration frequency is provided, the system can calculate the access heat and migration frequency of all TopicPartion data according to the data identification algorithm, cold data with low heat and frequency are automatically migrated to HDD from SSD, and hot data with high heat and frequency in the HDD is automatically migrated to the SSD. Hierarchical caching is never achieved. According to the system designed by the invention, the throughput of Kafka is improved, relatively low delay isprovided, and the storage cost is reduced.

Description

technical field [0001] The invention belongs to the field of big data storage, and relates to a Kafka-based distributed data stream hierarchical cache automatic migration algorithm. Background technique [0002] Kafka is a very popular distributed message system at present. Messages in Kafka are classified by topics. Producers produce messages and consumers consume messages, all of which are topic-oriented. In Kafka, topic is a logical concept, while partition is a physical concept. A topic may be split into multiple partitions for storage, and each topic is stored in multiple partitions, so a TopicPartition represents the topic name and the corresponding partition number of the log object being recorded. Kafka has multiple copies of each partition, including one leader copy and multiple follower copies. [0003] The storage devices used in daily life mainly include mechanical hard disk (Hard Disk Drive, HDD) and solid state disk (SolidState Disk, SSD). The storage capaci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F3/06H04L29/08

CPCG06F3/0647G06F3/0656G06F3/0604H04L67/568

Inventor 付国杨慧丽张岩峰张一奇

Owner NORTHEASTERN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Distributed data flow hierarchical cache automatic migration algorithm based on Kafka

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology