Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and device for retrieving data

A data retrieval and data request technology, applied in the field of big data processing, can solve the problems of long result return response time, retrieval, and the inability of SparkSQL's native architecture to have a large result set, so as to reduce the result return response time and the total retrieval time , solve memory problems, improve efficiency and availability

Inactive Publication Date: 2018-09-14
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a data retrieval method and device to solve the problem that the native architecture of SparkSQL cannot retrieve large result sets, and the result returns a long response time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for retrieving data
  • Method and device for retrieving data
  • Method and device for retrieving data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] This embodiment provides a data retrieval method. Such as figure 1 Shown is a flowchart of a data retrieval method according to an embodiment of the present invention.

[0028] Step S110, generating an execution plan according to the SQL statement requesting data retrieval.

[0029] Step S120, judging whether the execution plan satisfies the streaming result return condition; if yes, execute step S130; if not, execute step S140.

[0030] The streaming result return condition includes: the statement type of the execution plan is a preset type, and / or, the number of files to be scanned for data retrieval is greater than a preset threshold. Among them, the statement type is that there is no order requirement for the retrieval results.

[0031] Step S130, in the process of executing the execution plan, obtain the retrieval results one by one and provide them to the user one by one, until all the retrieval results are obtained and provided to the user.

[0032] In this e...

Embodiment 2

[0048] This embodiment describes the steps of generating a Job. This embodiment is executed on the Spark driver. image 3 It is a flow chart of steps for generating a Job according to an embodiment of the present invention.

[0049] Step S310, receiving the SQL statement submitted by the user.

[0050] This SQL statement is used to request data retrieval.

[0051] Include execution logic and metadata information of files to be retrieved in the SQL statement.

[0052] Wherein, according to the metadata information of the files to be retrieved, the number of files to be scanned for this retrieval can be determined.

[0053] In step S320, the SQL statement is parsed by a parser in SparkSQL to generate an execution plan.

[0054] Parsing the SQL statement into a logical execution tree; generating an execution plan according to the file metadata information of the library table in the SQL statement and the logical execution tree.

[0055] Step S330, according to the statement ...

Embodiment 3

[0066] This embodiment describes the steps of concurrently submitting Jobs. Figure 4 It is a flowchart of the steps of concurrently submitting Jobs according to an embodiment of the present invention.

[0067] In step S410, the Spark Driver starts a preset thread pool.

[0068] In this embodiment, multiple threads are maintained in the preset thread pool, and multiple threads concurrently submit multiple jobs to the Spark scheduling layer in a blocking manner to ensure that an appropriate amount of jobs run at the same time.

[0069] In this embodiment, the smaller execution granularity can effectively reduce the response time of retrieval results, effectively reduce the number of jobs queued for execution at the Spark scheduling layer, reduce the pressure of job distribution, and save The machine resource of the Spark Driver node where it is located.

[0070] In step S420, the Spark Driver judges whether there is an unsubmitted job; if yes, execute step S430; if not, finis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for retrieving data. The method includes generating execution plans according to structured query language (SQL) statements for requesting data retrieval;judging whether stream result return conditions are met by the execution plans or not; acquiring retrieval results one by one in procedures for executing the execution plans, and providing the retrieval results to users one by one until all retrieval results are completely acquired and are provided to the users. The method and the device have the advantages that the retrieval results are acquiredon by one in data retrieval procedures, and are simultaneously provided to the users to be used when acquired, accordingly, the result return response time and the total retrieval elapsed time duringdata retrieval by the aid of SparkSQL can be effectively shortened, and the efficiency and the availability can be enhanced during data retrieval by the aid of SparkSQL frameworks; the possible memory problems during mass result set retrieval by the aid of the SparkSQL can be solved by the aid of stream return modes, and mass result sets can be possibly retrieved by the aid of the SparkSQL frameworks.

Description

technical field [0001] The invention relates to the field of big data processing, in particular to a data retrieval method and device. Background technique [0002] With the continuous development of computer technology and the continuous improvement of informatization, the amount of data has increased rapidly, and the storage and application of massive data have also flourished. For example, in terms of network security, big data technology is used to analyze network attack behavior; in e-commerce, big data technology is used to analyze user shopping preferences or most popular products. Moreover, big data technology has played a positive role in building a conservation-oriented society and improving production efficiency. [0003] In massive data retrieval applications, Apache Foundation's distributed retrieval framework SparkSQL provides a HiveQL interface with Hive, which has high efficiency and availability and is widely used in the field of big data. However, with th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张鸿吕雁飞刘欣然马秉楠惠榛白堃
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products