Data processing method and system

A data processing and data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as high cost

Active Publication Date: 2012-12-19
TENCENT TECH (SHENZHEN) CO LTD +1
View PDF3 Cites 62 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In the existing technology, the overhead of Join query and Groupby query is relatively high, and the overhead of querying data tables stored in columns is relatively high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and system
  • Data processing method and system
  • Data processing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The inventor noticed during the invention that:

[0026] Due to the internal reason of the basic working principle of the Map / Reduce structure, the join (Join) query and grouping (Groupby) query that account for a large proportion in the query need to pull a large amount of data through the network and perform calculations on the reduce side. In the cluster system, network bandwidth is a relatively scarce resource, and the execution overhead of the Reduce task is relatively large. Therefore, if the Join queries and Groupby queries that are frequently used in applications can be converted into calculations performed on the Map side, bandwidth, computing resources, and disk I / O will be greatly saved.

[0027] Therefore, the inventor believes that dividing and conquering data by implementing Hash partitioning is a necessary means to complete calculations on the Map side.

[0028] As far as the Join calculation is concerned, by implementing the Hash partition, the Join cal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and a data processing system, wherein the method comprises the steps of: finishing the processing and the computations on a structured query language (SQL) query based on a mapping/simplifying framework; In the data processing of which a data storage task is finished by a distributed file system, defining a data table as a way using hash partitioning and storing data in the hash partitioning; and when the query to be implemented is determined as a join query or a grouping query, a source data table is a hash-partitioned data table, and a join key or a grouping key is a partitioning column, modifying the query as the query which takes the hash partitioning as a unit at the mapping end and is implemented after the joint query or the grouping query. According to the data processing method and the data processing system, the expenses on the join query and the grouping query can be reduced, the network bandwidth, the disk bandwidth and the network resource are saved, and the computational efficiency is improved.

Description

technical field [0001] The invention relates to a data processing method and system. Background technique [0002] For a data processing system that processes and calculates SQL (Structured Query Language) queries based on the Map / Reduce (mapping / simplification) framework, the data storage task is completed by the distributed file system. In the Map / Reduce framework, for the data processing task submitted by the client, the data processing task can first be split into several Map tasks, which are assigned to different machines for execution. Each Map task uses a part of the query input file as own input, and generate intermediate files through calculation; at the same time, the system will generate several Reduce tasks and assign them to different machines for execution, so as to pull the intermediate files generated by the Map task to the corresponding Reduce task for local calculation After processing, it is aggregated into the final output file. [0003] Among the SQL q...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张书彬赵彦荣郭玮李均赵伟洪坤乾徐钊
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products