Mass data real-time query method based on dynamic index structure

A technology of dynamic indexing and query methods, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of increased overlap of query nodes, decreased query efficiency, etc., to increase the access lock mechanism of the query process, and realize Efficient concurrent processing and real-time functions to reduce workload

Active Publication Date: 2014-03-26
朗坤智慧科技股份有限公司
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] On the other hand, the dynamic index structure R-Tree proposed by Guttman and its variants based on R-Tree can perform operations such as insertion and query at the same time, and support multi-dimensional models. It has obvious advantages in many spatial index technologies, but When processing large-scale data, as the height of the tree increases, the overlap of query nodes increases, resulting in a rapid decline in query efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass data real-time query method based on dynamic index structure
  • Mass data real-time query method based on dynamic index structure
  • Mass data real-time query method based on dynamic index structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] Such as figure 2 and image 3 As shown, the present invention proposes a real-time query method for massive data based on a dynamic index structure (DC-Tree). The method includes the following steps:

[0046] Step 1: The multidimensional data record DR passes the Z Curve mapping function f in the MasterNode z , to generate a dimensionality reduction result set S;

[0047] Step 2: MasterNode selects k hash functions, maps the result set S through Bloom Filter, and generates node set NN;

[0048] Step 3: update the data record DR, and implement dynamic construction for each element in the node set NN;

[0049] Step 4: The user User queries the MDS results, obtains the node set NN through steps 1 and 2, and enables the parallel query method;

[0050] Step 5: The user User aggregates the result sets of all visited nodes in the node set NN to obtain the final query result Rset.

[0051] The present invention reduces the dimensionality of massive multidimensional data s...

specific Embodiment approach

[0053] Then its specific implementation method is:

[0054] (1) The multidimensional data record DR passes the Z Curve mapping function f in the MasterNode z , to generate a dimensionality reduction result set S;

[0055] (2) MasterNode selects k hash functions, maps the result set S through Bloom Filter, and generates node set NN;

[0056] (3) Update the data record DR, and implement dynamic construction for each element in the node set NN;

[0057] Dynamic insertion: apply for lock LOCK for the root node D; update the Measure value of the directory node; if DR is only included in the MDS of a child of D, then set D as the child node of this directory; if DR is included in multiple In the MDS of D’s children, then find out the child that contains the least data nodes among these children, and set D as the child node of this directory; if DR is not included in the MDS of any child of D, first copy a D, may as well set it as D', add DR to each child node of D, calculate the ...

Embodiment 2

[0063] Such as figure 1 As shown, the present invention provides the architecture of a massive data real-time query system, which consists of the following four parts: data management node (Master Node), dynamic index tree (DC-Tree), data storage node (Data Node) and user (User). MasterNode is responsible for the positioning of data query / update, mainly using dimensionality reduction and fast query technology. DC-Tree is mainly used to dynamically construct a multi-dimensional attribute data query tree to provide real-time query effects. DataNode is responsible for the storage of specific data. The user (User) sends a query request to the MasterNode, and the MasterNode will process the content of the query request, determine that the query content is on some DataNodes, and submit these DataNodes that meet the requirements to the user. After completing this operation, the user will disconnect from the MasterNode and actively access the submitted DataNode for query.

[0064]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass data real-time query method based on a dynamic index structure (DC-Tree). According to the method, dimensionality reduction is carried out on a mass multi-dimension data set, high space efficiency and low query time are supported, distributed redundant storage is supported, therefore, data distribution efficiency in a traditional distributed mechanism is improved and the method is suitable for mass data processing. The method includes the first step that a multi-dimension data record (DR) maps a function fz through a Z Curve in a Master Node to generate a dimensionality reduction result set S; the second step that the Master Node selects k hash functions to carry out mapping on the result set S through a Bloom Filter to generate a node set NN; the third step that the data record DR is updated, and dynamic establishment is carried out on each element in the node set NN; the fourth step that a user inquires an MDS result to obtain the node set NN through the first step and the second step, and a parallel query method is started; the fifth step that the user carries out aggregation on all access nodes in the node set NN to obtain the final query result Rset.

Description

technical field [0001] The invention relates to the technical field of computer big data query, in particular to a real-time query method for massive data based on a dynamic index structure. Background technique [0002] With the rapid development of the Internet, social networks, mobile applications, etc. are becoming more and more popular. We see that the amount of network information data is increasing day by day. Big data is defined as a new concept of data. As a carrier of information, data plays a pivotal role. effect. The explosive growth of data has brought us into the era of large-scale data analysis, which is characterized by high computational intensity and requires large-scale concurrent storage and processing capabilities. How to quickly process massive data and extract valuable information from massive data in a timely and effective manner is a technical problem that needs to be solved urgently. [0003] At present, there are two mainstream technologies for l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2246
Inventor 陈丹伟庄俊
Owner 朗坤智慧科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products