A hdfs copy management method based on file access heat

A file access and copy management technology, applied in the field of HDFS copy management based on file access heat, can solve problems affecting cluster performance and other issues

Inactive Publication Date: 2021-06-15
NORTHEASTERN UNIV LIAONING
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The default copy placement method of HDFS is to select two nodes on the rack close to the client, and select a DataNode node for other racks to place the copy of the file, but these methods do not take into account the Join access between files according to the specific application Affinity issues and the load of file access heat on the nodes seriously affect the performance of the cluster

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hdfs copy management method based on file access heat
  • A hdfs copy management method based on file access heat
  • A hdfs copy management method based on file access heat

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention.

[0051] A HDFS copy management method based on file access heat, such as figure 1 shown, including the following steps:

[0052] Step 1: Calculate the number of copies of the file based on the file access heat.

[0053] Step 1.1: According to the file access log table, count the number of file access times within a certain period of time, and determine the file access popularity.

[0054] The file f v visit popularity The formula is shown in formula (1):

[0055]

[0056] in, For the file f within a time interval time v The number of visits, λ is the empirical critical value of the number of visits, which can be set as the average number of visits of all files at a time interval.

[0057] Step 1.2: Establish the time series file A of the corresponding relationship b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention proposes a HDFS copy management method based on file access heat, which calculates the number of copies of files based on file access heat; dynamically places copies of predicted hot files based on multi-file access correlation; deletes copies of files as needed. Delete operation; the method of the present invention utilizes the time series analysis method to predict the heat of file access, and provides the calculation formula of the heat of file access and the calculation formula of the number of copies, which can support the adjustment of the number of copies of dynamic hot files and solve the problem of hot files. Access bottlenecks and improve cluster service efficiency.

Description

technical field [0001] The invention belongs to the technical field of big data analysis and data mining, and in particular relates to an HDFS copy management method based on file access heat. Background technique [0002] With the development of Web technology, a large amount of data is also generated. Facing the storage and analysis of massive data and other related issues, related concepts such as cloud storage, cloud computing, big data analysis and data mining have also been proposed accordingly. At present, in the context of big data, Apache Hadoop has become a reference framework for distributed big data processing technology, which can effectively improve the efficiency of massive data processing. In the Hadoop framework, data copy management technology has always been a research hotspot and difficulty. Although a lot of research work has been carried out on HDFS data copy management, how to set the appropriate number of copies to adapt to changes in file access hea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/14
CPCG06F11/1461
Inventor 代钰杨雷郝琪李学学张斌
Owner NORTHEASTERN UNIV LIAONING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products