Supercharge Your Innovation With Domain-Expert AI Agents!

Transparent partition method and device based on sparksql

A transparent and partitioned table technology, applied in the computer field, can solve the problems of slow query response, different partition formats, and the inability to automatically map non-partition filter conditions, etc., to achieve the effect of narrowing the data range

Inactive Publication Date: 2021-04-02
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Unable to automatically map non-partition filter conditions to partition filter conditions
Business personnel who use data do not understand Spark Partition technology, or know Spark Partition technology but do not know which partition directory the data they want to query falls in. At this time, business personnel can only scan all the data in the table through full table scanning. Take out and filter all the data to find the data you need accurately. Every time a user submits a SQL query to analyze data, they cannot make full use of the data partition to narrow the query range of the data, resulting in slow query response, and when the amount of data in the table is too large, it is easy Exhausting system computing resources
[0005] 2. Inconsistent partition formats can easily lead to irregular data storage and management
The existing partitioning technology only defines the partition field type, and does not further standardize the specific format of the partition. For example, if the partition field type is String, any partition of the String type can exist as a subdirectory on HDFS, and the chaotic partition format can seriously increase the cost of big data management
[0006] 3. It is inconvenient to delete expired data
The increasing data will inevitably lead to the clearing of historical data to free up storage space. Traditional SparkSQL partitions often require manual intervention when deleting expired data due to irregular partition formats, and cannot well define a set of rules to periodically delete expired data. Data, frequent manual deletion of data will have a great impact on data security, and may even cause irreparable losses

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transparent partition method and device based on sparksql
  • Transparent partition method and device based on sparksql
  • Transparent partition method and device based on sparksql

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] As mentioned above, when using SparkSQL to analyze big data stored on HDFS, the application submitted by the user often takes a long time to execute due to the large amount of data, but the data analyzed by the user each time may only fall in a certain In one or several partitions, the existing partition technology cannot locate the partition where the data is located according to the user's non-partition filter conditions, thereby quickly reducing the scope of data search, and the business personnel who use and analyze data are not familiar with SparkSQL partition technology Or if you don't know the specific partitioning of the data table used, you can't use the partition to narrow the search range of the data, resulting in a waste of cluster computing resources and a very poor data query and analysis experience. In order to overcome the deficiencies in the existing partitioning technology that cannot automatically convert non-partitioning filter conditions, non-uniform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a SparkSQL-based transparent partition method and device, and the method comprises the steps: obtaining an appointed partition field and an adopted partition strategy when a user creates a partition table after a table building statement is submitted, determining a transparent partition through an SQL parser, and storing the transparent partition information; When a user inquires data through SQL, generating a logic execution plan through SQL analysis; Calculating a data partition of the query according to query conditions in the logic execution plan in combination withthe stored transparent partition information, and then rewriting the logic execution plan and generating an optimized physical execution plan; And according to the execution steps of the generated physical execution plan, dividing a specific task, and reading data from the data partition through the task.

Description

technical field [0001] The invention relates to the field of computers, in particular to a SparkSQL-based transparent partitioning method and device. Background technique [0002] At present, the well-known SparkSQL partitioning technology is to specify a special partition field when creating a data table, and the data table is mapped to a directory on HDFS, and the data of the table is stored in this directory, and the partition field data is not included with the non-partition fields. are stored in data files on HDFS, but instead exist on HDFS as subdirectories of the datatables directory. SparkSQL is an open source technology that uses standard SQL to process and analyze big data. It creates tables to store structured data, uses partitions to refine data storage, and divides data into different partitions and stores them in different subdirectories on HDFS. The subpartition name is the subdirectory name. This patent designs a transparent partition technology based on ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2453G06F16/242
Inventor 刘欣然张鸿吕雁飞马秉楠惠榛徐庆兰钢临
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More