Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for connecting tables

A filter table and association key technology, applied in the computer field, can solve the problems of increasing cluster disk IO, network IO, wasting computing resources, affecting task timeliness, etc., and achieve the effect of reducing cluster load and saving computing resources

Pending Publication Date: 2022-05-06
BEIJING WODONG TIANJUN INFORMATION TECH CO LTD +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the scenario mentioned above, the join condition is based on the small table on the left, that is, when the two tables are joined, only the data on the large table side that satisfies the condition equal to the data on the small table side will be retained, and the data that does not meet the conditions on the large table side The data of the connection condition will be considered as invalid data. Since the invalid data participates in shuffling, sorting, and merge calculation, on the one hand, it will affect the timeliness of tasks. On the other hand, invalid shuffling data will increase the load of cluster disk IO and network IO. waste a lot of computing resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for connecting tables
  • Method and device for connecting tables
  • Method and device for connecting tables

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present disclosure will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0031] It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings and embodiments.

[0032] figure 1 An exemplary system architecture of an embodiment of the method for joining tables or the apparatus for joining tables of the present disclosure can be applied.

[0033] First give an overview of the overall architecture of Spark SQL, such as figure 1 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a table connection method and device. The specific implementation mode of the method comprises the steps that a first table and a second table to be connected are obtained, and the data size of the first table is smaller than that of the second table; if the first table cannot be broadcasted, calculating the ratio of the line number of the first table to the line number of the second table; if the ratio is smaller than or equal to a preset threshold value, a real-time bloom filter is constructed according to the association keys of the first table, and the real-time bloom filter serves as a filtering operator to be inserted into the filtering condition of the second table; filtering data which do not meet the filtering condition from the second table to obtain a filtering table; and respectively shuffling and sequencing the first table and the filtering table, and then connecting the first table and the filtering table. According to the embodiment, the timeliness of the task of the connection table is improved, the computing resources are saved, and meanwhile, the cluster load is reduced.

Description

technical field [0001] The embodiments of the present disclosure relate to the field of computer technology, and in particular to a method and device for joining tables. Background technique [0002] As a general-purpose computing engine for large-scale data processing, Spark has the characteristics of high throughput, low latency, general and easy expansion, and high fault tolerance. It has formed an ecosystem with rapid development and wide application. Its main modules include: SparkCore, Spark SQL , Spark Streaming, MLlib, and GraphX, among which Spark SQL is the most widely used in big data and is continuously applied to large-scale data processing. [0003] In production scenarios, many Spark SQL business scenarios require a small (relatively small) table to join a large table. The small table here means that BroadcastHashJoin (broadcast hash join) will not be triggered, and SortMergeJoin (sort merge join) will be used by default. The implementation logic of the sort...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/27G06F16/28
CPCG06F16/2282G06F16/27G06F16/284
Inventor 魏秀利郑瑞峰王文生
Owner BEIJING WODONG TIANJUN INFORMATION TECH CO LTD