Unlock instant, AI-driven research and patent intelligence for your innovation.

A distributed file system and data processing method

A distributed file and data technology, applied in the field of distributed systems, can solve problems such as the extension of data consistency at startup

Active Publication Date: 2016-12-28
LENOVO (BEIJING) CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] The embodiment of the present application provides a distributed file system and a data processing method to solve the problem of data consistency and startup time delay once the Namenode server fails or needs to add a Namenode on the HDFS-based read-write separation architecture. The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed file system and data processing method
  • A distributed file system and data processing method
  • A distributed file system and data processing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0091] like figure 1 As shown, the distributed file system architecture information interaction diagram provided in the embodiment of the application introduces a distributed cluster composed of Zookeeper (consistency node), and the Zookeeper runs on the Namenode. In the embodiment of the application, the Namenode and the The ratio of Zookeeper is set to a one-to-one correspondence. Of course, it is not necessarily a one-to-one correspondence. The one-to-one correspondence is used here to better maximize the use of resources. This relationship is not fixed. The ratio can be set according to the actual situation. In theory, the larger the number of Zookeeper clusters, the better the performance, but this performance improvement is insignificant compared to the server hardware used more often, that is, the cost performance is actually very low. Setting the ratio of Namenode and Zookeeper to a one-to-one correspondence is a better cost-effective model.

[0092] The Master Namen...

Embodiment 2

[0112] like image 3 As shown, it is a structural diagram of a distributed file system provided by the embodiment of the present application. The system includes:

[0113] A Master Namenode (main control node) 201, for the node responsible for data writing

[0114] At least one Slave Namenode (from the control node) 202, for the node responsible for data reading;

[0115] An Observer (neutral control node) 203, a node for merging data mirroring and data logs;

[0116] The first detection and acquisition unit 204 is used to obtain the checkpoint corresponding to the time T from the neutral control node Observer when detecting that there is a newly added control node as the Slave Namenode in the distributed system at the time T. The data in it is obtained from the Master Namenode before the time T; the checkpoint is based on the regular merger of the data mirror FsImage and the data log Editlog; the time interval of the merger can be set according to the specific application, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed file system and a data processing method. The method is applied in the distributed file system, and comprises the steps as follows: at a T moment, when a newly-added control node taken as Slave Namenode in the distributed system is detected, data in a check point corresponding to the T moment are obtained from a neutral control node, and data before the T moment are obtained from the Slave Namenode, wherein the check point is formed in combination of a data mirror image with a data log, and the data mirror image and the data log are data obtained from a Master Namenode; and complete mirror image data are obtained in combination of the check point and the data before the T moment on the neutral control node, and the complete mirror image data are sent to the newly-added control node.

Description

technical field [0001] The invention relates to the field of distributed systems, in particular to a distributed file system and a data processing method. Background technique [0002] Hadoop Distributed File System, or HDFS for short, is a distributed file system. There are three necessary roles in the HDFS architecture, Namenode (control node), Datanode (data node) and Client (client), where Namenode is the single point of the cluster. In the overall architecture, Namenode acts as both the Server of the Client and the Server of the Datanode. Server. [0003] In a distributed file system, FSImage is a data mirror, Editlog is a data log, and the combination of the two is the complete data. Any modification to the file system metadata is recorded by the Namenode using a transaction log called EditLog. For example, creating a file in HDFS, Namenode will insert a record in Editlog to represent it; similarly, modifying the copy factor of the file will also insert a record in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30H04L29/08
Inventor 张云龙
Owner LENOVO (BEIJING) CO LTD