Hash join method and device

A hashing and data technology, applied in the input/output process of data processing, special data processing applications, instruments, etc., can solve the problems of low join efficiency and inability to use memory, and achieve the effect of improving efficiency and utilization

Inactive Publication Date: 2015-12-23
INSPUR GROUP CO LTD
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the existing hashjoin (hash join) operation, the entire data table that needs to be joined is stored in the memory for hashjoin, and the existing free memory cannot be used, and only a large memory can be used for processing, and the join efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hash join method and device
  • Hash join method and device
  • Hash join method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.

[0060] like figure 1 As shown, the embodiment of the present invention provides a method for hash connection, which may include the following steps:

[0061] S1: Obtain the first data to be operated on in the first table to be operated on and the second data to be operated on in the second table to be operated on;

[0062] S2: Calculate the hash value...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Hash join method and device. The method comprises the steps of S1, acquiring first data to be operated in a first table to be operated and second data to be operated in a second table to be operated; S2, calculating the Hash value of each first datum, determining a partition number corresponding to each first datum according to the Hash value of each first datum, calculating the Hash value of each second datum, and determining a partition number corresponding to each second datum according to the Hash value of each second datum; S3, reading data with the same partition number into the same zone in a memory; S4, conducting Hash join on the first data and the second data in the same zone. By the adoption of the Hash join method and device, join efficiency can be improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a hash connection method and device. Background technique [0002] Spark is an open source cluster computing system based on memory computing, which aims to make data analysis faster. Spark is very small and exquisite, developed by a small team led by Matei at the AMP Lab at the University of California, Berkeley. The language used is Scala, and while Spark has similarities to Hadoop, it provides a new framework for cluster computing with useful differences. First, Spark is designed for a specific type of workload in cluster computing, namely those that reuse working datasets (such as machine learning algorithms) between parallel operations. To optimize these types of workloads, Spark introduces the concept of in-memory cluster computing, where data sets can be cached in memory to reduce access latency. [0003] During the development of hadoop, in order to provide quick-sta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F3/0608G06F16/24557G06F16/2456
Inventor 卢军佐曹连超亓开元房体盈赵仁明
Owner INSPUR GROUP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products