A Multi-keyword Indexing Method Based on Locality Sensitive Hash on Graph

A local-sensitive hash and multi-keyword technology, applied in the field of graph data management, can solve problems such as complex graph data relationships, and achieve the effect of improving query efficiency

Active Publication Date: 2019-03-05
NANJING UNIV OF POSTS & TELECOMM
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there are the following problems in the keyword query on the graph: (1) Due to the complex relationship between the graph data, multiple keywords are often involved in the query analysis
[0005] The above two aspects of work did not involve how to effectively support multi-keyword queries, reduce disk I / O, and did not involve the problem of being overly sensitive to keyword spelling errors when querying

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multi-keyword Indexing Method Based on Locality Sensitive Hash on Graph
  • A Multi-keyword Indexing Method Based on Locality Sensitive Hash on Graph
  • A Multi-keyword Indexing Method Based on Locality Sensitive Hash on Graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0023] Such as figure 1 As shown, the present invention provides a multi-keyword indexing method based on locality-sensitive hashing on a graph, which supports graph keyword queries based on coarse-grained n-grams (ie: strings composed of n consecutive letters) After the graph is clustered, each cluster is characterized by a coarse-grained n-gram bit string. When querying, the candidate class is identified according to the matching result of the coarse-grained bit string of the keyword and the cluster bitmap cluster, including the following three steps:

[0024] Step 1: Class cluster bitmap representation; according to the keywords contained in the vertices of the graph, all graphs are mapped to a coarse-grained n-gram space. If there are N different n-grams in the n-gram space, each cluster corresponds to A bit string with a length of n....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a locality sensitive hashing based indexing method for multiple keywords on graphs and belongs to the technical field of graph data management. According to the method, the query on the multiple keywords on the graphs is supported by double-layer indexes; after a plurality of graphs are clustered in an n-gram space according to vertex keywords, an upper-layer bitmap and lower-layer locality sensitive hashing tables are constructed according to a clustered structure; the upper-layer bitmap realizes graph-to-cluster mapping according to n-gram (a character string formed by n continuous letters) of coarse particle size contained by the keywords; each cluster of a lower layer corresponds to a locality sensitive hashing table, and buckets of the hashing tables contain candidate graphs corresponding to n-gram of fine particle size. The method has the following advantages that query I / O and the number of the keywords are independent, the I / O times of the query on the multiple keywords are remarkably reduced, and the speed of query is increased; n-gram of different particle sizes is combined, so that the sensitivity of the indexes to spelling mistakes is effectively avoided, and the result of probability return expectation is increased.

Description

technical field [0001] The invention relates to a multi-keyword indexing method based on a local sensitive hash on a graph, which belongs to the technical field of graph data management. Background technique [0002] In recent years, fields such as the World Wide Web, social networks, biomedicine, and molecular structures of compounds have accumulated a large amount of associated and complex data, and their structures are usually abstracted as graphs. In order to achieve information acquisition and analysis, keyword query on the graph is a fundamental problem. At present, there are the following problems in the keyword query on the graph: (1) Due to the complex relationship between the graph data, multiple keywords are often involved in the query analysis. At present, the graph keyword index is mainly based on the inverted list and its variants. It is necessary to sequentially read the candidate graphs corresponding to each (pair) of keywords. I / O increases linearly with th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/901
CPCG06F16/9024
Inventor 韩京宇陈可佳曾建辉
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products