Unlock instant, AI-driven research and patent intelligence for your innovation.

An optimization method for large and small table association in hive

An optimization method and correlation analysis technology, applied in the field of big data processing, can solve problems such as low efficiency, and achieve the effect of improving efficiency and reducing the amount of data

Active Publication Date: 2021-10-19
南京烽火天地通信科技有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to provide an optimization method for large and small table association in hive, which solves the problem of inefficiency in the scene where large tables have indexes when Hive large and small tables are associated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An optimization method for large and small table association in hive
  • An optimization method for large and small table association in hive
  • An optimization method for large and small table association in hive

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Such as figure 1 with figure 2 An optimization method for large and small table association in hive is shown, including the following steps:

[0031] Step 1: Establish a server cluster composed of multiple servers, and establish a Hadoop framework structure on the basis of the server cluster;

[0032] Step 2: build the hive data warehouse tool on the Hadoop framework structure, the Hive data warehouse tool provides an HQL interface externally, and the Hive data warehouse tool maps large-scale data sets stored on HDFS or other storage media into data tables, and the data tables According to the size of the data volume, it is divided into large data table and small data table;

[0033] Step 3: The Hive client completes the analysis of the data table with the help of Mapreduce at the bottom layer of the Hive data warehouse tool;

[0034] Step 4: Using the MapReduce computing framework as the execution engine of hive, the hive client executes multi-table association tas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an optimization method for the association of large and small tables in hive, which belongs to the technical field of large data processing, and solves the problem of inefficiency in the scene where the large table has an index when the Hive large and small tables are associated; the invention utilizes the index feature of the large table , reduce the amount of data transmitted and analyzed, and then improve the efficiency of large and small table association analysis.

Description

technical field [0001] The invention belongs to the technical field of big data processing. Background technique [0002] With the growth of data volume and the development of big data technology, how to quickly and effectively find out the information hidden in massive data has become a difficult problem in the era of big data. Multi-table association analysis based on distributed technology (referred to as multi-table association) is It is a method commonly used in the industry to discover data value from massive data. In practical applications, multi-table association can be divided into large data table and large data table association analysis (referred to as large table association) and large data table and small data table association analysis (referred to as large and small table association). The efficiency of multi-table association becomes a measure of distribution. An important indicator of the processing framework. [0003] Hadoop is a commonly used distribute...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22G06F16/27
CPCG06F16/2282G06F16/27
Inventor 马东周帅锋郑伟鲁光明马全辉卞璐璐穆宁王栋平
Owner 南京烽火天地通信科技有限公司