Supercharge Your Innovation With Domain-Expert AI Agents!

Distributed data flow hierarchical cache automatic migration algorithm based on Kafka

An automatic migration and data flow technology, applied in the field of big data storage, can solve the problems that the Kafka system does not support hierarchical storage, etc., and achieve the effects of improving stream processing performance, reducing storage costs, and expanding functions

Active Publication Date: 2020-12-08
NORTHEASTERN UNIV
View PDF17 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the problem that the Kafka system does not support hierarchical storage, the present invention proposes an automatic data migration algorithm (HHF-Migrate) combining access heat and migration frequency, which calculates the access heat and migration frequency of TopicPartition by counting the hot data information of the log , automatically migrate cold data with low heat and low frequency from SSD to HDD, and hot data with high heat and high frequency in HDD will be automatically migrated to SSD, thus realizing hierarchical caching of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data flow hierarchical cache automatic migration algorithm based on Kafka
  • Distributed data flow hierarchical cache automatic migration algorithm based on Kafka
  • Distributed data flow hierarchical cache automatic migration algorithm based on Kafka

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0034] This embodiment is carried out in a cluster environment, the cluster includes three nodes, the software environment is Ubuntu16.04 system, the programming language is Java / scala, and the hierarchical storage system of each node is based on Samsung solid-state hard disk SSD (250GB) and Seagate mechanical hard disk HDD (1TB) build. The corresponding working parameters of the cluster are as follows: the copy coefficient of the topic is 2, the number of brokers (servers) is 3, the partition coefficient is also 3, and the number of producers and consumers is 6. Producers publish messages to the hierarchical cache system, and consumers read messages from the cache system. When reading and writing the log of the TopicPartition in the Kafka cluster, such as figure 1 As shown in the log module, use the read() and append() functions.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of big data storage, and relates to a distributed data flow hierarchical cache automatic migration algorithm based on Kafka. According to characteristics of Kafka data access, a hot data storage structure is designed, so that not only is the storage space reduced, but also metadata information of hot data can be managed according to the structure. According to the structure of the hot data, a data automatic migration algorithm (HHFMigrate combining access heat and migration frequency is provided, the system can calculate the access heat and migration frequency of all TopicPartion data according to the data identification algorithm, cold data with low heat and frequency are automatically migrated to HDD from SSD, and hot data with high heat and frequency in the HDD is automatically migrated to the SSD. Hierarchical caching is never achieved. According to the system designed by the invention, the throughput of Kafka is improved, relatively low delay isprovided, and the storage cost is reduced.

Description

technical field [0001] The invention belongs to the field of big data storage, and relates to a Kafka-based distributed data stream hierarchical cache automatic migration algorithm. Background technique [0002] Kafka is a very popular distributed message system at present. Messages in Kafka are classified by topics. Producers produce messages and consumers consume messages, all of which are topic-oriented. In Kafka, topic is a logical concept, while partition is a physical concept. A topic may be split into multiple partitions for storage, and each topic is stored in multiple partitions, so a TopicPartition represents the topic name and the corresponding partition number of the log object being recorded. Kafka has multiple copies of each partition, including one leader copy and multiple follower copies. [0003] The storage devices used in daily life mainly include mechanical hard disk (Hard Disk Drive, HDD) and solid state disk (SolidState Disk, SSD). The storage capaci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06H04L29/08
CPCG06F3/0647G06F3/0656G06F3/0604H04L67/568
Inventor 付国杨慧丽张岩峰张一奇
Owner NORTHEASTERN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More