A pipeline-based distributed multi-table connection method and system

A connection method and pipeline technology, applied in database distribution/replication, special data processing applications, instruments, etc., can solve problems such as low efficiency, and achieve the effect of efficient scheduling and fast and efficient connection

Active Publication Date: 2018-05-01
工创集团有限公司
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a pipeline-based distributed multi-table connection method and system in view of the low efficiency of the existing data segmentation method when it is applied to the multi-table connection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A pipeline-based distributed multi-table connection method and system
  • A pipeline-based distributed multi-table connection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0030] The invention provides a pipeline-based distributed multi-table connection method. By executing two pipelines in parallel, the self-adaptive segmentation during query operation can be realized, and it can be completed quickly and efficiently.

[0031] see figure 1 , is a structural diagram of a pipeline-based distributed multi-table c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed multi-table connection method and system based on processing lines. The distributed multi-table connection method comprises following steps, executed in parallel, that a mapping processing unit reads tables to be connected, and carries out mapping processing on the tables to be connected to obtain corresponding data blocks, and outputting the tables to be connected in a manner that any two tables to be connected form a group; a second reducing processing unit reads the data blocks of the second group of tables to the last group of tables in order, and carries out hash connection on the two data blocks of each group of tables to obtain the two-table connection result of the two tables of each group; a first reducing processing unit reads an initial multi-table connection result after the two data blocks of the first group of tables are subjected to hash connection, and after a second reducing machine group completes hash connection of one group of tables, the first reducing processing unit sequentially connects the current multi-table connection result with the two-table connection result of the group of tables until all groups of tables are connected. By executing processing lines in parallel, query operation and self-adaptive division can be achieved at the same time, and connection of multiple tables can be completed quickly and efficiently.

Description

technical field [0001] The invention relates to distributed data computing technology, in particular to a pipeline-based distributed multi-table connection method and system. Background technique [0002] The advent of the era of big data has driven the rapid growth of data volume, and there is an urgent need for a technology to store and process such a huge amount of data. Therefore, Google's DFS (distributed file system) and distributed computing model MapReduce (mapping and statute ) emerged as the times require, and now distributed computing technology has become the mainstream technology for massive data storage and analysis. For massive data analysis, connection query is an important operation, and in actual application, the required data may not only be limited to a certain table, but involve multiple tables, which brings certain difficulties to the connection operation. difficulty. [0003] Before executing the join query, the corresponding data must be segmented f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/137G06F16/182G06F16/2282G06F16/27
Inventor 王宏志孙旭冉赵志强
Owner 工创集团有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products