A method to replace hadoop storage module with pvfs

A storage module and connection module technology, applied in the computer field, can solve the problems of loss of parallelism, single point of failure, no parallelism, etc., and achieve the effects of convenient reading, writing and maintenance, avoiding computing resources, and reducing time overhead.

Active Publication Date: 2018-04-24
NANJING UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the early HDFS, there was only one NameNode, which may cause a serious single point of failure. When the NameNode crashes, all unsaved data will be lost; in later versions, HDFS added multiple secondary NameNodes, and when the primary NameNode crashes, the secondary NameNode continues to run, but there is still only one NameNode running at the same time, which is likely to cause network congestion when file access is frequent
[0006] 2. HDFS uses multiple copies to access files in parallel, and nodes can only perform parallel read and write operations in units of copies
Although this can improve the overall throughput of the file system, in order to meet the access requirements of different nodes for the same data file, HDFS must create enough copies, otherwise it will lose its parallelism
[0007] 3. HDFS stores data on the corresponding computing nodes, which saves network overhead, but with the development of computer network technology, the time spent on network communication has been one to two orders of magnitude lower than that of I / O operations. Almost negligible; the price HDFS pays for saving network overhead is the complete absence of intra-file parallelism
[0010] But as far as the current technology is concerned, in order to achieve higher performance, Hadoop is very closely integrated with HDFS. The virtual methods provided by the virtual classes related to the file system in Hadoop basically correspond to the upper layer interface of HDFS. However, this It brings inconvenience and difficulty to use other file systems instead of HDFS

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method to replace hadoop storage module with pvfs
  • A method to replace hadoop storage module with pvfs
  • A method to replace hadoop storage module with pvfs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The invention provides a realization method of using PVFS to replace the Hadoop storage module, and uses the parallel virtual file system PVFS to replace the Hadoop distributed file system HDFS module. The invention realizes the connection from Hadoop to PVFS, and mainly includes three modules: a PVFS program interface module, a Hadoop-PVFS module and a JNI connection module.

[0046] The PVFS program interface module is a C language module, which is compiled into a dynamic link library ".so", which encapsulates the program interface of PVFS and makes the internal parameters of PVFS transparent to users.

[0047] The reason why it is necessary to write a PVFS program interface module is because PVFS does not provide a complete API. The program interface of PVFS is more dependent on the PVFS kernel module and ROMIO and other class libraries. The PVFS kernel module has strict requirements on the Linux kernel version, and the use of ROMIO and other class libraries is not w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for replacing HADOOP storage modules with PVFS, using parallel virtual file system PVFS to replace the distributed file system HDFS module of HADOOP, the present invention realizes the connection from HADOOP to PVFS, mainly includes three modules: PVFS program interface, HADOOP- PVFS module and JNI connection module. The present invention aims to select a more suitable distributed file system as the storage module of HADOOP to replace HDFS, so as to reduce the overhead of HADOOP in file operations and improve the performance of HADOOP in MapReduce calculations, especially in data-intensive calculations.

Description

technical field [0001] The invention belongs to the field of computer technology, relates to distributed computing and distributed file systems, and in particular to the connection between distributed computing architecture and distributed file systems, specifically a method of connecting PVFS distributed file systems to Hadoop to replace Hadoop The implementation method of the original storage module HDFS. Background technique [0002] Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. Hadoop is currently the most widely used distributed computing platform. It adopts the MapReduce distributed computing model and provides a series of interfaces and frameworks to help users efficiently utilize the computing resources of distributed clusters and improve the parallelism ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/44G06F9/455G06F17/30
Inventor 唐杰包念原武港山
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products