Unlock instant, AI-driven research and patent intelligence for your innovation.

Data parallel processing method, device and system

A parallel processing and data technology, applied in the field of data processing, can solve the problems of increasing storage space, increasing the extra overhead of starting and stopping tasks multiple times, increasing the difficulty of data consistency check, etc.

Inactive Publication Date: 2012-05-09
TELEFON AB LM ERICSSON (PUBL)
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 2. It is necessary to keep two copies of the same data in the distributed file system and the local file system, which increases unnecessary storage space;
[0009] 3. For systems with high data consistency requirements, it is necessary to ensure that no data is lost or duplicated during the copy process, which increases the difficulty of data consistency checks;
[0010] 4. When processing a large number of small files, it is necessary to start a task for each small file for processing, which increases the additional overhead of starting and stopping the task multiple times, and the processing efficiency is very low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data parallel processing method, device and system
  • Data parallel processing method, device and system
  • Data parallel processing method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0038] The data parallel processing method, device and system of the embodiments of the present invention are applied to Hadoop's parallel computing framework MapReduce. In order to better understand the scheme of the present invention, first, a brief description of the processing flow of MapReduce in the prior art is given.

[0039] In the description of the following embodiments, the file stored on the collection server is referred to as a local file.

[0040] Such as figure 1 As shown, it is a typical processing flow of MapReduce in the prior art, where:

[0041] The Map task reads the data to be processed through the corresponding input source class, and after the data is converged / aggregated, the Reduce task outputs the data through the correspond...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data parallel processing method, a data parallel processing device and a data parallel processing system. The method comprises the following steps of: detecting data to be processed in a data acquisition server in real time; performing partition and / aggregation treatment on detected data to form data partitions with a preset size; establishing description information corresponding to each data partition and storing the description information into a data partition queue in turn; reading a piece of description information from the data partition queue after a requestfor processing an Map task of a node in a Hadoop system is received, and acquiring the data from the corresponding data partition according to the description information; and sending the acquired data to an input source of the Map task. When the method, the device and the system of the invention are used, the data can be directly transmitted to a MapReduce node from the acquisition server for processing, so that storage space is saved, a processing flow is simplified, and the efficiency and reliability of data processing are improved.

Description

Technical field [0001] The present invention relates to data processing technology, in particular to a data parallel processing method, device and system. Background technique [0002] Hadoop is a distributed system for massive data storage and computing based on the shared-nothing architecture. It consists of several members, including: HDFS (Hadoop Distributed File System), MapReduce (parallel computing framework), HBase ( Open source implementation of Google BigTable) and so on. Among them, as an open parallel computing framework, MapReduce can be combined with various currently popular distributed products to achieve flexible parallel computing and distributed computing functions. It can integrate HDFS, HBase, Cassandra (a hybrid non-relational database). ) And other platforms are used as the input source of MapReduce for parallel processing, and the processed data is output to output sources such as HDFS, HBase, and Cassandra. [0003] In short, the calculation process of Ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 温文全喻先兵
Owner TELEFON AB LM ERICSSON (PUBL)