Spark-based large-scale distributed DataFrame query method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A query method and distributed technology, which is applied in the field of large-scale distributed DataFrame query, can solve the problems of DataFrame lack of flexible and easy-to-use query functions, and achieve good scalability, good ease of use, and improved query performance

Active Publication Date: 2019-07-23

NANJING UNIV

View PDF3 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] Purpose of the invention: in order to solve the problem that Pandas DataFrame cannot handle large-scale data and the existing distributed DataFrame programming model of Spark lacks the flexible and easy-to-use query function, the present invention provides a kind of query method based on Spark's large-scale distributed DataFrame, the The method can efficiently query large-scale distributed DataFrames, including location-based and tag-based queries, and provides a Pandas-like DataFrame interface, which solves the problem of lack of flexible and easy-to-use query functions for distributed DataFrames under existing big data processing platforms Problems make the function of Spark DataFrame richer and more powerful

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0025] The technical scheme of the present invention is mainly based on the distributed big data processing system Spark for distributed computing, and the distributed memory database Redis and the shared memory object storage database Plasma Store for storage. The distributed big data processing system Spark is an open source system of the Apache Foundation (project homepage http: / / spark.apache.org), and this software does not belong to the content of the present inventi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Spark-based large-scale distributed DataFrame query method, which comprises the following steps: using a system framework based on a distributed computing execution engine Spark, using DataFrame as a programming model, and using Python as a programming language; in the distributed system, through encapsulating an existing query interface of a Spark native DataFrame, eliminating the incompatibility with an API of a mainstream standalone DataFrame computing library Panas; constructing a lightweight global index, and providing a plurality of distributed DataFrame query functions according to different conditions; and establishing local indexes and auxiliary indexes, so that the query performance is improved. The problems that an existing single-machine platform DataFrame is poor in expandability and cannot process large-scale data, and an existing big data processing platform distributed DataFrame query interface is not rich, poor in usability and low in performance are solved.

Description

technical field [0001] The invention relates to the technical field of distributed computing, in particular to a spark-based large-scale distributed DataFrame query method. Background technique [0002] In big data analysis applications, structured big data analysis and processing based on table models is still the most basic requirement in many industries. DataFrame is an easy-to-use table data programming model in a programming language environment. It has a good abstraction for the statistical process of data analysis, so it has received extensive attention. [0003] The traditional relational database provides a table data model oriented to SQL query, but SQL query needs to provide the support of heavyweight (heavy-weighted) database system and SQL query engine in the background, coupled with the complexity of SQL query language, so based on SQL The table data model is still not convenient enough to operate and use in the common data analysis programming language enviro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/2455G06F16/27G06F16/22

CPCG06F16/2455G06F16/278G06F16/22Y02D10/00

Inventor 顾荣黄宜华施军

Owner NANJING UNIV

Spark-based large-scale distributed DataFrame query method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology