Unlock instant, AI-driven research and patent intelligence for your innovation.

A real-time metadata update method for spark-sql retrieval

A metadata and file metadata technology, applied in the field of big data processing, can solve SparkSql performance bottlenecks, large time and resource overhead, performance bottlenecks of native metadata update methods, etc.

Active Publication Date: 2020-08-04
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the metadata used for retrieval changes, the Spark Sql framework will discard all the original cache and reload the full amount of retrieval metadata, which will cause large time and resource overhead in the case of massive data
In business scenarios where metadata is frequently changed, frequent updates and retrieval of metadata cause performance bottlenecks in Spark Sql, which is a key issue in using Spark Sql for retrieval
With the continuous increase of data volume and the continuous development of big data technology, Spark Sql's native metadata update method has a performance bottleneck, and its efficiency directly affects business applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time metadata update method for spark-sql retrieval
  • A real-time metadata update method for spark-sql retrieval
  • A real-time metadata update method for spark-sql retrieval

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0019] The metadata real-time update method for Spark Sql retrieval provided by the present invention can improve the real-time performance of Spark Sql retrieval data results, and effectively reduce the time and resource overhead during metadata update, specifically including the following contents.

[0020] The first aspect of the present invention provides a method for incremental update of retrieval metadata oriented to Spark Sql, which can avoid regular full update of retrieval metadata and reduce time and resource overhead when updating metadata. When SparkSql is retrieving, the metadata used will be cached, and the retrieved data block file metadata will be cached in the memory in the form of a collection. When the incremental informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a metadata real-time update method for Spark-Sql retrieval. According to the method, when Spark Sql performs retrieval, used metadata is cached, and retrieved file metadata of data blocks is cached in a memory in a set form; when incremental information of the file metadata is obtained, whether retrieved metadata in a table which the incremental information belongs to exists in a cache is checked first; and if yes, incremental file metadata is added into the cached file metadata set in the table, and then incremental update of the retrieved metadata of Spark Sql is completed. The invention furthermore provides a processing method for addition of the retrieved metadata of Spark Sql. Through the processing method, the addition of the retrieved metadata can be processed in real time, and the real-time performance of a retrieval result is improved. Through the update method, frequent total update of the retrieved metadata can be avoided, therefore, metadata update time is shortened, and resource overhead during metadata update is reduced.

Description

technical field [0001] The invention relates to a method for updating metadata in real time for Spark Sql retrieval, which can improve the real-time performance and operation performance of Spark Sql when retrieving massive data, and belongs to the field of big data processing. Background technique [0002] With the continuous development of computer technology and the continuous improvement of informatization, the amount of data has grown rapidly, and the storage and application of massive data has also flourished. For example, in terms of network security, big data technology is used to analyze network attack behavior; in e-commerce, big data technology is used to analyze user shopping preferences or most popular products. Big data technology has played a positive role in building a conservation-oriented society and improving production efficiency. [0003] In massive data retrieval applications, Apache Foundation's distributed retrieval framework Spark Sql provides a Hiv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27G06F16/2455G06F16/23
CPCG06F16/2379G06F16/24552G06F16/27
Inventor 李斌斌王树鹏王振宇张磊
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI