Realization method of full-text retrieval based on Spark framework

An implementation method and full-text technology, applied in the field of full-text retrieval based on the Spark framework, can solve problems such as unsupported full-text retrieval, and achieve strong practicability, application range, wide application prospects, and high efficiency

Active Publication Date: 2018-04-20
BEIJING SCISTOR TECH +1
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Full-text retrieval is a text data retrieval method that matches the text used in the document with the retrieval item, which can facilitate statistics and analysis of data; while the Spark framework of the Apache Foundation is a general-purpose parallel framework with high efficiency and usability, providing the same HiveQL interface as Hive; but the native Spark framework does not support the retrieval method of full-text retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Realization method of full-text retrieval based on Spark framework
  • Realization method of full-text retrieval based on Spark framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the purpose, technical solution and advantages of the present invention clearer, the hierarchical and segmented backup data organization and management method according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0032] Firstly, after receiving the SQL query statement submitted by the user, the present invention performs grammatical analysis to generate the syntax tree of the SQL statement, and further analyzes the content of the syntax tree to generate a logical execution plan for retrieval. In the process of generating the logical execution plan, the Statements related to full-text retrieval are parsed into custom functions in Spark. Then, obtain the metadata of the table that executes the SQL statement retrieval from Hive, and judge whether the field to be retrieved supports full-text retrieval. If so, the data is indexed through the field hash index in the file m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a realization method of full-text retrieval based on a Spark framework and belongs to the field of big-data processing. According to the method, a to-be-executed SQL statementis received, and a syntax tree is generated and converted into corresponding logic plans; metadata of retrieving all tables are retrieved from Hive, fields supporting full-text retrieval are sought, and data blocks are initially cropped through field hash indexes; disk locations where the data blocks are specifically stored are acquired from file metadata according to a query condition; the logicexecution plans are converted into a set of tasks which can be executed in a distributed manner, and target nodes and target processes where the tasks are specifically executed are determined throughall the data block locations; and the tasks are distributed and executed, execution results are summarized, and a final result is acquired by iteration. The method has higher efficiency, can quickly complete full-text retrieval of massive data, has very high practicality and a very large application range in the field of big-data processing, and has very wide application prospects.

Description

technical field [0001] The invention belongs to the field of big data processing, and in particular relates to a method for realizing full-text retrieval based on a Spark framework. Background technique [0002] With the continuous development of computer technology and the continuous improvement of informatization, the amount of data is increasing rapidly, and the application of big data is becoming more and more extensive. For example, in terms of network security, big data technology is used to analyze network attack behavior; in e-commerce, big data technology is used to analyze user shopping preferences or the most popular products; in urban construction, big data technology is used to build smart cities, which is convenient People travel. And so on, big data technology has played a positive role in promoting the construction of a conservation-oriented society and improving generation efficiency; but with the continuous increase of data volume and the continuous develo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/24528G06F16/24532G06F16/24552
Inventor 强倩孙昊良张慧琳周渊张晨李斌斌刘庆良
Owner BEIJING SCISTOR TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products