Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A distributed data access method, device and system

A distributed data and access system technology, applied in the field of data access, can solve the problems of occupying the main node memory and reducing the efficiency of HDFS data access, and achieve the effect of improving efficiency, reducing the number of establishments, and saving memory

Active Publication Date: 2018-02-13
ZICT TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Therefore, HDFS is only suitable for storing large files (such as files with a data volume greater than 64M), when HDFS stores a large number of small files (such as files with a data volume of less than 64M), such as pictures, documents, etc. When storing files, a large number of cache maps need to be established, which will greatly occupy the memory of the master node, resulting in a greatly reduced efficiency of HDFS access data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed data access method, device and system
  • A distributed data access method, device and system
  • A distributed data access method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] A distributed data storage method provided by the present invention, such as figure 1 As shown, the method includes:

[0052] Step 101, the small file processing module receives the file uploaded by the client;

[0053] Here, the small file processing module is a device added between the client and HDFS, and the small file processing module can process small file merging asynchronously.

[0054] Step 102, when it is determined that the data volume of the file is less than a preset data volume threshold, save the file in the local system;

[0055] Here, the data volume threshold may be 64M, or it may be set according to actual conditions.

[0056] The local system refers to a storage space other than HDFS.

[0057] Step 103, when the sum of the data volumes of the files stored in the local system reaches the preset merging threshold, the files are merged, and the merged large file is sent to HDFS.

[0058] Here, the merging threshold may be 64M, or it may be set acco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method, a device and a system for distributed data access. The method for distributed data access comprises the following steps: a small file processing module receives a file uploaded by a client; if determining that the data volume of the file is smaller than a preset data volume threshold, the file is saved in a local system; when the sum of the data volumes of files saved in the local system reaches a preset mergence threshold, the files are merged, and the large merged file is transmitted to a Hadoop distributed file system (HDFS). The method for distributed data access is capable of saving the occupied internal memory of a main node when lots of small files are stored, and improving the data access efficiency of the HDFS.

Description

technical field [0001] The present invention relates to data access technology, in particular to a distributed data access method, device and system. Background technique [0002] In the current distributed storage system, Hadoop is generally used as a storage technology, and Hadoop is an open source distributed system infrastructure. Each file stored in Hadoop Distributed File System (HDFS) needs to correspond to a block (Block), and the master node (NameNode) in HDFS needs to establish a cache mapping for each file and its corresponding block. The more files stored in HDFS, the more cache maps need to be established, and the more memory of the master node is occupied. [0003] Therefore, HDFS is only suitable for storing large files (such as files with a data volume greater than 64M), when HDFS stores a large number of small files (such as files with a data volume of less than 64M), such as pictures, documents, etc. When storing files, a large number of cache maps need t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1824
Inventor 陈昌
Owner ZICT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products