Life cycle management method and equipment for data file of Hadoop distributed file system

A distributed file and life cycle technology, applied in file system, file system function, electronic digital data processing, etc., can solve problems such as low data processing efficiency, solve data file attribute identification and life cycle management problems, and improve efficiency Effect

Active Publication Date: 2014-05-07
TAOBAO CHINA SOFTWARE
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The embodiment of the present application provides a lifecycle management method and equipment for Hadoop distributed file system data files to solve the problem of inefficient data processing due to the inability to perform data lifecycle management according to the characteristics of the data itself in existing technical solutions The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Life cycle management method and equipment for data file of Hadoop distributed file system
  • Life cycle management method and equipment for data file of Hadoop distributed file system
  • Life cycle management method and equipment for data file of Hadoop distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Such as figure 1 As shown, it is a schematic flow diagram of a life cycle management method of a Hadoop distributed file system data file in Embodiment 1 of the present application, and the method includes the following steps:

[0028] Step S101 , by analyzing the currently stored metadata, determine the leaf directories contained therein and the files belonging to the leaf directories.

[0029] It should be noted that the metadata mentioned in the technical solutions proposed in the embodiments of the present application are particularly suitable for large-scale data stored in large-scale file storage systems such as the Hadoop distributed file system. Such data has a large scale, There are many sources and complex data levels. Therefore, through the life cycle management method proposed in the embodiment of the present application, the life cycle management can be carried out in detail according to the data characteristics, and the data management efficiency can be im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a life cycle management method and life cycle management equipment for data files of a Hadoop distributed file system. The life cycle management method includes: separating leaf directories from the current metadata, giving the file size attributes of the corresponding files to the leaf directories, normalizing according to the service time of each leaf directory to obtain a normalized directory, determining the type of each normalized directory according to the service time data and the file size data in each normalized directory, and correspondingly processing the data of each type of normalized directory according to preset life cycle management strategies so as to lead the service time concept of the data to a data storage and management process. The life cycle management method and the life cycle management equipment solve the problem of big data processing load due to the fact that the existing life cycle management needs to mark data in large scale in terms of different types and levels of data, effectively use the time attributes of the data files to improve the data processing efficiency, and finally realize to mark the attributes of the data files of the distributed file system and manage the life cycles of the data files of the distributed file system.

Description

technical field [0001] The embodiment of the present application relates to the technical field of data storage, and in particular to a method and device for lifecycle management of data files in a Hadoop distributed file system. Background technique [0002] Because the Hadoop Distributed File System (Hadoop Distributed File System, HDFS) has the characteristics of high fault tolerance, it is often used to deploy on low-cost hardware. The file system can provide high-throughput data access, which is suitable for data access of applications with very large data sets. The Hadoop distributed file system relaxes the requirements of POSIX (Portable Operating System Interface, Portable Operating System Interface), and can access data in the file system in the form of streams. [0003] The named node (namenode) in the Hadoop distributed file system will store the metadata of the Hadoop distributed file system files and directories in the binary file of the downloaded image (fsima...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/17G06F16/182
Inventor 熊佳树
Owner TAOBAO CHINA SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products