Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for reading and writing data for hadoop computing

A technology for reading and inputting data, which is applied in the field of distributed computing, can solve the problems of time-consuming, waste of storage space, transmission bandwidth and processing time, etc., and achieve the effect of reducing the number of reads and saving the number of copies

Active Publication Date: 2018-03-02
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This brings huge consumption and wastes a lot of storage space (for example, more than 2 copies of data need to be stored), transmission bandwidth (a large amount of data transmission occupies bandwidth) and processing time (because there are many single points in the processing, resulting in overall time-consuming)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for reading and writing data for hadoop computing
  • Method and system for reading and writing data for hadoop computing
  • Method and system for reading and writing data for hadoop computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The method and system for reading and writing data used for HADOOP calculations according to the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0018] figure 1 It is a comparative schematic diagram of processing data on a non-HDFS storage system between the prior art and the method according to the present invention. exist figure 1 In, network file system (NFS) is used as an example of non-HDFS, but those of ordinary skill in the present invention will refer to Figure 1 to Figure 5 From the description of the embodiments of the present invention, it can be understood that the general concept of the present invention is applicable to any data storage system other than HDFS.

[0019] figure 1 The upper side shows the data flow for processing data on a non-HDFS storage system according to the prior art. As shown in the figure, when using the HADOOP computing model to process data stored on non-HDFS...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and system for reading and writing-in of data used for HADOOP computation. The method is used for reading input data used for HADOOP computation from a non-HDFS, and is characterized by comprising the steps that data reading classes used for reading the data from the non-HDFS are defined, wherein the data reading classes inherit the Record Reader classes; a get Record Reader method and a get Splits method in Input Format classes in an HADOOP distributed type computation model are achieved, wherein in the achieved get Record Reader method, examples or objects of the defined data reading classes are created and are returned, and in the achieved get Splits method, the Record Reader examples or objects returned from the called get Record Reader method are determined as the examples or objects of the defined data reading classes. The strong dependence on an HDFS by the HADOOP is removed, the reading frequency of the data and the copying frequency of the data are decreased, storage space is saved, and processing time is shortened.

Description

technical field [0001] The present invention relates to a technology for distributed computing, in particular to a method and system for processing data used for distributed computing. Background technique [0002] The distributed computing platform is based on the distributed storage platform and is used to process the data in the distributed storage platform. The popular MapReduce computing model in recent years can use distributed computing power to process big data, and this convenience also stimulates data computing needs. [0003] However, the MapReduce computing model also has constraints such as strong binding between distributed computing platforms and storage platforms. Take HADOOP, which is a specific implementation of the MapReduce computing model, as an example. The HADOOP computing platform requires that the data it calculates must be stored on the storage platform (HDFS) that is strongly bound to it. This brings inconvenience to the specific application of H...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/182
Inventor 杨斐
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD