Unlock instant, AI-driven research and patent intelligence for your innovation.

Data processing method and device

A data processing and data node technology, applied in the field of data processing, can solve problems such as uneven data distribution

Active Publication Date: 2017-09-12
HUAWEI CLOUD COMPUTING TECH CO LTD
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the embodiment of the present invention is to provide a data processing method and device to alleviate the problem of uneven data distribution in the SQL on hadoop system to improve system performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device
  • Data processing method and device
  • Data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0031] In the prior art, from the perspective of system architecture layered optimization, two types of solutions are mainly proposed to solve the problem of data skew:

[0032] The first category is to optimize the execution layer, such as consistent hash algorithm, etc.;

[0033] The consistent hash algorithm was proposed by Karger et al. of MIT in 1997 to solve the distributed cache. The design goal is to solve the hot spot problem in the In...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of data processing, in particular to a data processing method and device for lessening the problem that data distribution in an SQL on hadoop system is uneven. The method is mainly applied to the SQL on hadoop system, and comprises the steps that an SQL server obtains file allocation results of files to be processed, wherein the files to be processed are allocated by one or more data nodes in the Hadoop system, according to the file allocation results, one or more data nodes to be dispatched are determined from the data nodes, a target data node set corresponding to each of the data nodes to be dispatched is determined, and a first dispatch instruction is sent to target data nodes in at least one target data node set. Thus, by the adoption of the method, the processing load of the data nodes to be dispatched is shared to the target data nodes, the problem is lessened that the data distribution in the SQL on hadoop system is uneven, and the system performance is improved.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data processing method and device. Background technique [0002] Hadoop-based structured query language (SQL on Hadoop) is an important research direction in the field of big data. The current SQL on Hadoop system mainly has two architectures, one is based on a certain runtime framework, such as map-reduce ( Map Reduce, MR) computing model to build a query engine + distributed file system (Hadoop Distributed File System, HDFS) / HBASE system, such as Hive is a data warehouse tool for Hadoop; the other is large-scale parallel processing with the help of relational databases System (Massively Parallel Processing, MPP) architecture and Structured Query Language (Structured Query Language, SQL) engine + HDFS / HBASE system, such as Impala, HAWQ, etc. However, no matter what kind of architecture will involve data distribution, if the data distribution is uneven, there will be data skew...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F9/50
CPCG06F9/5027G06F16/182
Inventor 董亚辉刘海龙
Owner HUAWEI CLOUD COMPUTING TECH CO LTD