Unlock instant, AI-driven research and patent intelligence for your innovation.

Data table connection method and device

A connection method and a technology for connecting devices, which are applied in the field of big data processing, can solve problems such as large differences in Keys and reduce overall computing performance, and achieve the effects of improving computing performance, reducing Shuffle operations, and saving network resource consumption

Pending Publication Date: 2021-04-02
LENOVO (BEIJING) LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] If the difference between the Keys in the two data tables is large (for example, 50% of the keys in all records in table A do not exist in table B), then the result of the join operation for this type of records is empty, but, according to Join execution The logic still needs to perform the Shuffle operation on all the records of the two data tables first, then the Shuffle operation corresponding to the record belonging to the key with a large difference can be considered invalid
When there are many invalid Shuffle operations, the overall performance of the calculation is greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data table connection method and device
  • Data table connection method and device
  • Data table connection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0045] An embodiment of the present disclosure provides a data table connection method, such as figure 1 shown, including:

[0046] Step 101, determining a first data table and a second data table to be connected; the first data table includes a plurality of first records, and the second data table includes a plurality of second records;

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data table connection method. The method comprises the steps of determining a first data table and a second data table to be connected, wherein the first data table comprisesa plurality of first records, and the second data table comprises a plurality of second records; determining a first key value key of each first record and a second key of each second record; pullingthe first record and the second record to which the matched first key and second key belong from the original partition to one or more new partitions; and connecting the first record and the second record of the one or more new partitions.

Description

technical field [0001] The invention relates to big data processing technology, in particular to a data table connection method and device. Background technique [0002] Apache Spark is a fast and general computing engine designed for large-scale data processing. The Shuffle process in the Spark calculation process needs to pull data from one partition (Partition) to another partition. This process will generate network resource consumption, memory consumption, and disk IO (Input Output) consumption. [0003] When it comes to the join calculation of two data tables, before performing the join operation on the two tables, it is necessary to assign the records in the two data tables to multiple In the partition, the join operation is performed on the records belonging to two tables in each partition, and the allocation of records to multiple partitions involves the Shuffle operation of migrating data from one partition to another. [0004] If the difference between the Keys ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/2455
CPCG06F16/2282G06F16/2456
Inventor 李栋
Owner LENOVO (BEIJING) LTD