Supercharge Your Innovation With Domain-Expert AI Agents!

Kafka-based automatic migration method of distributed data stream hierarchical cache

An automatic migration and data flow technology, applied in the field of big data storage, can solve the problems that the Kafka system does not support hierarchical storage, etc., and achieve the effects of improving stream processing performance, reducing storage costs, and expanding functions

Active Publication Date: 2021-10-22
NORTHEASTERN UNIV LIAONING
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the problem that the Kafka system does not support hierarchical storage, the present invention proposes a data automatic migration method (HHF-Migrate) combining access heat and migration frequency, which calculates the access heat and migration frequency of TopicPartition by counting the hot data information of the log , automatically migrate cold data with low heat and low frequency from SSD to HDD, and hot data with high heat and high frequency in HDD will be automatically migrated to SSD, thus realizing hierarchical caching of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Kafka-based automatic migration method of distributed data stream hierarchical cache
  • Kafka-based automatic migration method of distributed data stream hierarchical cache
  • Kafka-based automatic migration method of distributed data stream hierarchical cache

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0034] This embodiment is carried out in a cluster environment, the cluster includes three nodes, the software environment is Ubuntu16.04 system, the programming language is Java / scala, and the hierarchical storage system of each node is based on Samsung solid-state hard disk SSD (250GB) and Seagate mechanical hard disk HDD (1TB) build. The corresponding working parameters of the cluster are as follows: the copy coefficient of the topic is 2, the number of brokers (servers) is 3, the partition coefficient is also 3, and the number of producers and consumers is 6. Producers publish messages to the hierarchical cache system, and consumers read messages from the cache system. When reading and writing the log of the TopicPartition in the Kafka cluster, such as figure 1 As shown in the log module, use the read() and append() functions.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of big data storage, and relates to a Kafka-based automatic migration method for distributed data stream hierarchical cache. According to the characteristics of Kafka data access, the storage structure of hot data is designed, which not only reduces the storage space, but also manages the metadata information of hot data according to the structure. According to the structure of hot data, a data automatic migration method (HHF‑Migrate) combining access heat and migration frequency is proposed. The system will calculate the access heat and migration frequency of all TopicPartition data according to this data identification method, and automatically migrate data with low heat and frequency. Cold data is migrated from SSD to HDD, while hot data with high temperature and high frequency in HDD will be automatically migrated to SSD without hierarchical caching. The system designed by the invention improves the throughput of Kafka, provides lower delay and reduces storage cost.

Description

technical field [0001] The invention belongs to the field of big data storage, and relates to a Kafka-based automatic migration method for hierarchical buffers of distributed data streams. Background technique [0002] Kafka is a very popular distributed message system at present. Messages in Kafka are classified by topics. Producers produce messages and consumers consume messages, all of which are topic-oriented. In Kafka, topic is a logical concept, while partition is a physical concept. A topic may be split into multiple partitions for storage, and each topic is stored in multiple partitions, so a TopicPartition represents the topic name and the corresponding partition number of the log object being recorded. Kafka has multiple copies of each partition, including one leader copy and multiple follower copies. [0003] The storage devices used in daily life mainly include mechanical hard disk (Hard Disk Drive, HDD) and solid state disk (SolidState Disk, SSD). The storage...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F3/06H04L29/08
CPCG06F3/0647G06F3/0656G06F3/0604H04L67/568
Inventor 付国杨慧丽张岩峰张一奇
Owner NORTHEASTERN UNIV LIAONING
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More