Kafka-based automatic migration method of distributed data stream hierarchical cache

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An automatic migration and data flow technology, applied in the field of big data storage, can solve the problems that the Kafka system does not support hierarchical storage, etc., and achieve the effects of improving stream processing performance, reducing storage costs, and expanding functions

Active Publication Date: 2021-10-22

NORTHEASTERN UNIV LIAONING

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Aiming at the problem that the Kafka system does not support hierarchical storage, the present invention proposes a data automatic migration method (HHF-Migrate) combining access heat and migration frequency, which calculates the access heat and migration frequency of TopicPartition by counting the hot data information of the log , automatically migrate cold data with low heat and low frequency from SSD to HDD, and hot data with high heat and high frequency in HDD will be automatically migrated to SSD, thus realizing hierarchical caching of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0034] This embodiment is carried out in a cluster environment, the cluster includes three nodes, the software environment is Ubuntu16.04 system, the programming language is Java / scala, and the hierarchical storage system of each node is based on Samsung solid-state hard disk SSD (250GB) and Seagate mechanical hard disk HDD (1TB) build. The corresponding working parameters of the cluster are as follows: the copy coefficient of the topic is 2, the number of brokers (servers) is 3, the partition coefficient is also 3, and the number of producers and consumers is 6. Producers publish messages to the hierarchical cache system, and consumers read messages from the cache system. When reading and writing the log of the TopicPartition in the Kafka cluster, such as figure 1 As shown in the log module, use the read() and append() functions.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the field of big data storage, and relates to a Kafka-based automatic migration method for distributed data stream hierarchical cache. According to the characteristics of Kafka data access, the storage structure of hot data is designed, which not only reduces the storage space, but also manages the metadata information of hot data according to the structure. According to the structure of hot data, a data automatic migration method (HHF‑Migrate) combining access heat and migration frequency is proposed. The system will calculate the access heat and migration frequency of all TopicPartition data according to this data identification method, and automatically migrate data with low heat and frequency. Cold data is migrated from SSD to HDD, while hot data with high temperature and high frequency in HDD will be automatically migrated to SSD without hierarchical caching. The system designed by the invention improves the throughput of Kafka, provides lower delay and reduces storage cost.

Description

technical field [0001] The invention belongs to the field of big data storage, and relates to a Kafka-based automatic migration method for hierarchical buffers of distributed data streams. Background technique [0002] Kafka is a very popular distributed message system at present. Messages in Kafka are classified by topics. Producers produce messages and consumers consume messages, all of which are topic-oriented. In Kafka, topic is a logical concept, while partition is a physical concept. A topic may be split into multiple partitions for storage, and each topic is stored in multiple partitions, so a TopicPartition represents the topic name and the corresponding partition number of the log object being recorded. Kafka has multiple copies of each partition, including one leader copy and multiple follower copies. [0003] The storage devices used in daily life mainly include mechanical hard disk (Hard Disk Drive, HDD) and solid state disk (SolidState Disk, SSD). The storage...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F3/06H04L29/08

CPCG06F3/0647G06F3/0656G06F3/0604H04L67/568

Inventor 付国杨慧丽张岩峰张一奇

Owner NORTHEASTERN UNIV LIAONING

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Kafka-based automatic migration method of distributed data stream hierarchical cache

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology