Data space multi-dimension indexing method based on load balance and query log

A load balancing and data space technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of high cost of hard disk I/O overhead, inability to efficiently support large-scale data query processing, and inability to load indexes in memory Problems such as graphs to achieve the effect of minimizing communication overhead

Inactive Publication Date: 2016-11-09
HARBIN ENG UNIV
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, none of these existing methods can efficiently support large-scale data query processing
This is because in the process of large-scale ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data space multi-dimension indexing method based on load balance and query log
  • Data space multi-dimension indexing method based on load balance and query log
  • Data space multi-dimension indexing method based on load balance and query log

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0068] Specific implementation mode one: as figure 1 As shown, the implementation of the load balancing and query log-based data space multidimensional indexing method is described in detail in this embodiment as follows:

[0069] 1. In order to successfully extend the inverted index into the data space, the attribute labels and attribute values ​​are aggregated and coded into token words:

[0070] Define Token. For an attribute-value pair (a, v), its corresponding token is defined as t=v / / a.

[0071] Essentially, an entity is often composed of a set of attribute-value pairs (note that the content can be regarded as an attribute-value pair). In other words, an entity is actually a vector of tokens (t 1 ,t 2 ,...,t |D| ), where D represents all the different token identifiers in the data space.

[0072] Define entity vector, an entity vector is defined as o=(w 1 ,w 2 ,...,w |D| ), where w i Indicates the token word t i the weight of.

[0073] The partition-based data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data space multi-dimension indexing method based on a load balance and a query log, and relates to the technical field of data space indexing. The purposes that inverted indexes are distributed in different index nodes to enable the index nodes to keep load balance, communication consumption related in query processing is minimized, and the searching space is reduced are achieved. In vertical partitioning, token words for indexing are gathered through the query log and words frequently occurring in an entity, and an access mode between user query and an inverted list is represented by a hypergraph; in horizontal partitioning, access mode information between the user query and the entity is depicted by a hypergraph, horizontal partitioning problems are reduced into hypergraph partitioning problems, therefore, loads of the different index nodes keep balanced, and communication consumption related in querying is reduced. By combining the vertical partitioning and horizontal partitioning strategy, two-dimensional mixed indexing is constructed and expanded to be three-dimensional indexing. An experiment on a public data set DBLP shows that the handling capacity, the query response time and the expansibility of the method are superior to those of an existing method.

Description

technical field [0001] The invention relates to a data space multidimensional indexing method, and relates to the technical field of data space indexing. Background technique [0002] With the rapid development of big data and Internet technology, data space scenarios have become more and more common, especially in the fields of Web and personal information management systems such as Wikipedia, Google Base, and Linked Data. Different from traditional relational databases that mainly focus on specific domains and a fixed number of attributes, data spaces are characterized by heterogeneity, sparseness, large scale, and interrelationships. Therefore, it is of great significance to provide users with efficient data space query services. Usually, indexing is often one of the important means to improve query processing efficiency, so it is of great significance to study an efficient data space indexing technology. [0003] At present, the research on data spatial index technolog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2264
Inventor 王红滨王念滨周连科祝官文王瑛琦何鸣宋奎勇
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products