Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data processing method, device, equipment, and computer-readable storage medium

A technology of data processing and computer programs, which is applied in the field of data warehouses, can solve problems such as low operating efficiency, large resource occupation of tasks, and unpassed SQL statements, etc., to achieve the effects of improving operating efficiency, speeding up acquisition, and shortening processing time

Active Publication Date: 2021-10-15
HANGZHOU ANHENG INFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, in the process of building an enterprise-level data warehouse, business personnel write SQL (Structured Query Language, Structured Query Language) statements, and send the written SQL statements to hive for task conversion and task processing. To obtain the results corresponding to the SQL statement, however, because some business personnel are only familiar with SQL syntax, but are not familiar with the underlying architecture principles of hive and HDFS (Hadoop Distributed File System), therefore, the written SQL statement will not be After combining the optimization and parameter setting of the actual bottom layer, the corresponding task takes up too much resources and the operating efficiency is too low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data processing method, device, equipment, and computer-readable storage medium
  • A data processing method, device, equipment, and computer-readable storage medium
  • A data processing method, device, equipment, and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] hive is a data warehouse tool based on Hadoop, which provides rich SQL query methods to analyze data stored in Hadoop distributed file system: it can map structured data files into a database table, and provide complete SQL query function; you can convert SQL statements into MapReduce tasks to run, and analyze the required content through your own SQL query. This set of SQL is referred to as Hive SQL, so that users who are not familiar with MapReduce can easily use SQL language to query, summarize and analyze data. .

[0042] Although Hive can convert SQL statements written by users into corresponding tasks, so that analysts who are not familiar with mapReduce can also process data in the data warehouse, but because some business personnel are only familiar with SQL syntax, they are not familiar with the underlying architecture principles of hive and HDFS. Unfamiliarity leads to the fact that the written SQL statements are not combined with the actual underlying optimiz...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The application discloses a data processing method, device, device and computer-readable storage medium. The method includes: determining the data volume of the request table in the SQL request according to the data volume of each data table; Adjust the association sequence of the request table, split the request table with a data volume greater than the preset value into sub-tables, and send the request table, sub-table, and SQL request after adjusting the association order to hive, which will be processed by hive to obtain intermediate data; determine the intermediate data The size of the data, the number of reduce, and the average amount of intermediate data allocated by each reduce generate data allocation information and send it to hive, and hive distributes the intermediate data evenly to each reduce for writing out the intermediate data. The technical solution disclosed in this application optimizes SQL requests and evenly distributes intermediate data obtained by hive to reduce resource occupation and improve task processing efficiency.

Description

technical field [0001] The present application relates to the technical field of data warehouses, and more specifically, to a data processing method, device, equipment, and computer-readable storage medium. Background technique [0002] A data warehouse is a collection of strategies that provide all types of data support for the decision-making process at all levels of the enterprise. [0003] At present, in the process of building an enterprise-level data warehouse, business personnel write SQL (Structured Query Language, Structured Query Language) statements, and send the written SQL statements to hive for task conversion and task processing. To obtain the results corresponding to the SQL statement, however, because some business personnel are only familiar with SQL syntax, but are not familiar with the underlying architecture principles of hive and HDFS (Hadoop Distributed File System), therefore, the written SQL statement will not be After combining the optimization and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/21G06F16/28G06F16/25G06F9/54
CPCG06F16/21G06F16/284G06F16/252G06F9/54
Inventor 郑钱男范渊黄进
Owner HANGZHOU ANHENG INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products