Cross-cluster data processing system and method based on HQL

A data processing system and cross-cluster technology, applied in the field of big data processing, can solve the problems of rapid analysis of data by unfavorable data warehouse analysts, increase data maintenance costs, laborious and laborious, etc., and achieve easy promotion, no learning cost, and reduced maintenance cost effect

Active Publication Date: 2021-11-02
SICHUAN XW BANK CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, with the development of business, data is often distributed on different clusters. Since HQL can only analyze data in a single cluster, in this cross-cluster scenario, HQL will not be able to give full play to its advantages. The existing technical solution is based on The cluster operation and maintenance personnel synchronize the data to the same cluster and map the data into a Hive table, which is not only laborious, but also not conducive to the rapid analysis of the data by the data warehouse analysts, and also increases the maintenance cost of the data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-cluster data processing system and method based on HQL
  • Cross-cluster data processing system and method based on HQL
  • Cross-cluster data processing system and method based on HQL

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] like figure 1 As shown in a cross-cluster HQL based data processing system of the present invention, the system includes a client, calculation engine management module, the cluster management modules, across the cluster table management module;

[0049] The client, for HQL query statement to be sent to the compute engine management module, while the data reception result of the compute engine query management module;

[0050] The compute engine management module, using Hive sent by the client engine parses the HQL statement, analyzed in HQL table used, and the table belongs to the cluster (cluster may be present, may present a non-clustered), to achieve this cluster or across a cluster computing; while the grammar checker module supports cross-cluster of HQL;

[0051] The cluster management module for acquiring computing resources in real time, all clusters (cpu cores and memory size) and storage resources (HDFS storage space usage), and apply certain rules to calculate the...

Embodiment 2

[0078] like figure 2 , image 3 As shown, the difference from the first embodiment is that the embodiment provides an HQL-based span data processing method, which is applied to an HQL-based hurdle data processing system according to Embodiment 1. This method includes:

[0079] S0: Using the ANTLR4 Technical Framework Analysis of the syntax of the HQL statement sent by the client correctly, if the syntax of the HQL statement is correct, perform the type of the HQL statement; if the syntax error of the HQL statement is returned to the client ;

[0080] S1: Use the Hive Engine to resolve the type of HQL statement to be queried by the client, the type of HQL statement includes DML type, DDL type;

[0081] S2: If the parsed HQL statement is the DDL type, continue to resolve the corresponding cluster of the HQL statement, while sending the HQL statement to the corresponding set of groups;

[0082] S3: If the parsed HQL statement is the DML type, continue to resolve the HQL statement for ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-cluster data processing system and method based on HQL. The system comprises a client, a computing engine management module, a cluster management module and a cross-cluster table management module. The client is used for sending a to-be-queried HQL statement to the calculation engine management module and receiving query result data at the same time; the calculation engine management module is used for analyzing an HQL statement sent by the client by using a Hive engine, analyzing a table used in the HQL and a cluster to which the table belongs, and realizing cluster or cross-cluster calculation; the cluster management module is used for acquiring computing resources and storage resources of all clusters in real time and calculating the current most idle cluster, so that the computing engine management module acquires the most idle cluster to execute the HQL statement; and the cross-cluster table management module is used for managing and maintaining tables synchronized in a cross-cluster manner. According to the method, cross-cluster HQL data calculation is solved, and the operation speed and the cluster resource utilization rate are improved.

Description

Technical field [0001] The present invention relates to a large data processing technologies, and particularly relates to a data processing system across clusters and a method based on the HQL. Background technique [0002] Hive is a data warehouse based tool Hadoop (developed by the Apache Foundation's distributed system architecture) development, can simply be a structured data mapped to a table in the database, as well as defining simple SQL ( structured query language) query language called HQL, simple execution engine Hive HQL statement into the MapReduce (distributed computing system) task for large distributed data analysis and mining, the system comprises the MapReduce there map (mapping) unit and Reduce (reduction) unit, map unit for data mapping, sorting, and to achieve sub-stack, the Reduce data combination unit. HQL appear greatly reduces the cost of learning the number of warehouse analysts, data analysis plays an important role. [0003] However, with the developmen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/242G06F16/27G06F16/22
CPCG06F16/2433G06F16/27G06F16/2282Y02D10/00
Inventor 王守明
Owner SICHUAN XW BANK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products