Hive data warehouse-based fast association realization method and device

A technology of data warehouse and implementation method, which is applied in the field of Hive data warehouse, can solve the problems of reducing data association efficiency and frequent use, and achieve the effect of improving data association efficiency, reducing overhead, and reducing the amount of associated data

Inactive Publication Date: 2017-01-11
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] However, the above operations are frequently used in work, and even used multiple times in one data processing
How to achieve data association using memory at any time, so as to optimize association and deal with data skew, the current technology based on this aspect is still basically blank
In addition, MapJoin will participate in the execution plan when performing data association, and needs to wait for cluster scheduling to allocate resources, so the efficiency of data association will be reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hive data warehouse-based fast association realization method and device
  • Hive data warehouse-based fast association realization method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0034] figure 1 It is a schematic diagram of main steps of a method for implementing fast association based on a Hive data warehouse according to an embodiment of the present invention. Such as figure 1 As shown, the method for implementing fast association based on the Hive data warehouse in the embodiment of the present invention mainly includes the following steps S11...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Hive data warehouse-based fast association realization method and device, and aims at carrying out data association by using internal memories at any time without generating execution plans and waiting for the scheduling and resource distribution of clusters so as to improve the data processing efficiency. The Hive data warehouse-based fast association realization method comprises the following steps of: pre-processing required data in an association table into an appointed data cache area; loading the required data from the data cache area according to interface parameters to ensure that the required data enters an internal memory, and storing the required data; and after the loading is completed, querying a corresponding result according to an association keyword transmitted by a user.

Description

technical field [0001] The invention relates to a Hive data warehouse, in particular to a method and device for realizing fast association based on the Hive data warehouse. Background technique [0002] Since the explosion of the Internet, traditional data warehouse systems used to support mainstream search engine companies, e-commerce, and social networking sites have long been overwhelmed by the growing mass of data, and the emergence of Hive built on Hadoop clusters coincides with At that time, it has become the gospel of realizing distributed data warehouse in the era of big data. However, with the explosive growth of data scale and the continuous expansion of business, the logic of data processing has become more and more complex. As a result, the timeliness of data processing is gradually reduced. Most of the existing Hive optimization schemes can only optimize tasks from the HQL level and parameter level. Of course, there is no problem with HQL itself, but there is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/283
Inventor 张军刘志祖牟一超
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products