Unlock instant, AI-driven research and patent intelligence for your innovation.

A Query Optimization Method for Distributed Database

A query optimization and database technology, applied in the field of query optimization of distributed databases, can solve the problems of heavy task load, unpredictable time consumption, and time-consuming HDFS data, so as to optimize retrieval and query and shorten query time.

Active Publication Date: 2022-02-18
南京中新赛克科技有限责任公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] It is not difficult to see from the above process that when the amount of data is large and the cluster size is limited, the task load of each node is very heavy, and it will be very time-consuming for the executors of each node to traverse the local HDFS data; The storage capacity of the machine can reach about 20-30T. If you need to fuzzily query a specified word and its context in a large number of files, the time consumed by a single process to traverse all local files is unpredictable.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Query Optimization Method for Distributed Database
  • A Query Optimization Method for Distributed Database
  • A Query Optimization Method for Distributed Database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings.

[0039] The present invention filters the files under the framework of the distributed file system in the form of index construction, generates an index file within 15M for each file in the data storage stage, and checks the index file in advance before searching and traversing the original file; Judging whether the file contains (must contain, may contain, and must not contain three results) the character string to be fuzzy searched, thereby avoiding scanning a large number of unnecessary original files.

[0040] The index file generation process is as follows figure 1 :

[0041] Step 1: Apply for a piece of memory with a size of 9801594B. The size of the index is determined according to the demand. The larger the index, the more accurate the matching rate. The present invention takes the 9M index as an example;

[0042] Step 2: Segment the word for the field to be i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a query optimization method for a distributed database. Before traversing all massive files, the index file corresponding to the file is checked first, so as to judge whether the corresponding original massive file contains the target to be queried, thereby avoiding many Unnecessary search for massive file operations greatly improves retrieval performance.

Description

technical field [0001] The invention relates to a data processing and application method of a computer cluster, in particular to a query optimization method of a distributed database. Background technique [0002] The Hadoop ecosystem includes HDFS, distributed programming model MapReduce, HBase, Hive, etc.; it has almost become the standard of big data processing tools at present. [0003] HDFS is one of the core projects under the Hadoop ecosystem. It is developed based on the streaming data processing mode and the requirements for processing large files. It has low hardware requirements, good fault tolerance, and high reliability. Before Hadoop 2.0, HDFS clusters usually included a NameNode and multiple DataNodes. NameNode manages the namespace, maintains the directory tree of the entire file system and the index directory of files. DataNode is used to perform specific tasks, store and query files, etc.; it sends stored file block information to NameNode regularly throu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2453G06F16/27G06F16/22
Inventor 鹿林王伟王东
Owner 南京中新赛克科技有限责任公司