Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and system for creating dynamic copies of hot data files in hdfs

A technology for data files and hot data, which is applied in electrical digital data processing, special data processing applications, instruments, etc. It can solve the problems of high network overhead and data reading delay, and achieve the effect of avoiding data reading delay and network delay.

Active Publication Date: 2017-10-27
INSPUR BEIJING ELECTRONICS INFORMATION IND
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and system for creating dynamic copies of hot data files in HDFS, so as to solve the technical problems of how to avoid data reading delays and large network overhead caused by large-scale movement of data file copies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for creating dynamic copies of hot data files in hdfs
  • A method and system for creating dynamic copies of hot data files in hdfs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solution and advantages of the present invention more clear, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

[0038] figure 1 It is a flowchart of a method for creating a dynamic copy of a hotspot data file in HDFS in this embodiment.

[0039] S101 identifying hotspot data files;

[0040] The method for identifying hotspot data files may include the following steps:

[0041] Record the accessed data files in chronological order and assign weights to each accessed data file; when assigning weights, the data files accessed first are assigned a smaller weight, and the data files accessed later are assigned a weight Big;

[0042] When the preset time is reached, the sum of weights as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a system for dynamically creating duplicates of hotspot data files in a Hadoop distributed file system (HDFS). The method includes identifying the hotspot data files; dynamically creating the duplicates of the files. The step for dynamically creating the duplicates of the files includes preferably creating the duplicates of the hotspot data files in local nodes where the hotspot data files are requested. The method and the system have the advantage that problems of data reading delay and high network overhead due to mass movement of duplicates of data files can be solved.

Description

technical field [0001] The present invention relates to the duplication creation of data files, in particular to a method and system for creating dynamic duplications of hot data files in a Hadoop distributed file system (HDFS). Background technique [0002] With the development of modern networks, the amount of data has increased dramatically. In order to achieve efficient and reliable processing of massive data volumes, Hadoop clusters have emerged, which are built with multiple cheap machines as cluster nodes. A Hadoop cluster can divide an application into many small work units, each of which can be executed on any cluster node. In addition, Hadoop also provides a distributed file system HDFS to store data on each cluster node, providing high throughput for reading and writing data. [0003] At present, HDFS does not distinguish between hot and cold data files when configuring data file copies for nodes, and statically configures copies for all data files to ensure fau...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/128G06F16/182
Inventor 郭美思吴楠
Owner INSPUR BEIJING ELECTRONICS INFORMATION IND