Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Parallel access to data in a distributed file system

A distributed file and file system technology, applied in the direction of electronic digital data processing, special data processing applications, multi-programming devices, etc., can solve problems such as limited processing

Active Publication Date: 2016-07-13
INITIO TECH
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such approaches may have drawbacks, e.g. requiring prior decisions about e.g. the number of these parts and the potentially sub-optimal choice of nodes where these parts are extracted, since these named parts are themselves distributed
However, this approach may be limited to processing with a specific application, and may not necessarily benefit from an implementation of that application that is not ported to the file system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel access to data in a distributed file system
  • Parallel access to data in a distributed file system
  • Parallel access to data in a distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] refer to figure 2 , computing system 100 includes distributed file system 110 , distributed processing system 120 , and also includes or has access to computing system 130 . An example of this type of file system 110 is Hadoop Distributed File System (HDFS), while distributed processing system 120 is the Hadoop framework, although it should be understood that the methods described herein are not limited to use with HDFS. Distributed file system 110 includes storage of a number of named units, which are hereinafter referred to as "files," without intending to assign specific attributes to the word "file." Often, the name of a file can include a path referencing a containing unit such as a folder. In general, each file may have portions of it stored on different data stores 112 (eg, disk subsystems) of the file system.

[0049] In some implementations, the methods described above utilize the Hadoop framework to cause parallel execution of copies of the mapping process ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An approach to parallel access of data from a distributed filesystem provides parallel access to one or more named units (e.g., files) in the filesystem by creating multiple parallel data streams such that all the data of the desired units is partitioned over the multiple streams. In some examples, the multiple streams form multiple inputs to a parallel implementation of a computation system, such as a graph-based computation system, dataflow-based system, and / or a (e.g., relational) database system.

Description

[0001] Cross References to Related Applications [0002] This application claims priority to US Application Serial No. 14 / 090,434, filed November 26, 2013. technical field [0003] The invention relates to parallel access to data in a distributed file system. Background technique [0004] An example of a distributed file system is the Hadoop Distributed File System (HDFS). HDFS is a scalable, portable distributed file system written in Java. HDFS has a set of nodes ("data nodes") that hold data for multiple files in the file system and are capable of serving file blocks over a data network. Each file is usually distributed on multiple nodes. The directory of the file system is maintained by a set of nodes ("namenodes"). This directory can be used to identify the location of a number of distributed blocks for each named file in the file system. [0005] refer to Figure 1A-1B , using the MapReduce programming model is one way to process data in a distributed file system ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F9/54
CPCG06F9/5011G06F16/182G06F16/2471G06F16/1858
Inventor A·M·沃尔蕾斯B·P·杜罗斯M·A·伊斯曼T·韦克林
Owner INITIO TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products