Parallel indexing method supporting real-time biased query of high dimensional data

A high-dimensional data and indexing technology, applied in the search field, can solve problems such as unsatisfactory real-time performance and scalability, and achieve good real-time performance
CN103455531AActive Publication Date: 2013-12-18SHENZHEN INSTITUTE OF INFORMATION TECHNOLOGY

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
SHENZHEN INSTITUTE OF INFORMATION TECHNOLOGY
Publication Date
2013-12-18

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
Patent Text Reader

Abstract

The invention is applicable to the field of indexing technologies and provides a parallel indexing method supporting real-time biased query of high dimensional data. The method includes: a query system extracts features of data attribute by means of MapReduce and the like and inputs the features; a plurality of index servers in the query system establish parallel indexes by a Hash function which flexible divides data buckets according to data density; distance change carried by biased query is projected to map to the index servers of the query system by a directed clustering mapping method; if mapping errors exceed the range acceptable to users, the query system submits the biased query to the parallelly-combined index servers for respective processing; the parallelly-combined index servers return screened results respectively according to weight ratios given by users; all returned results are calculated and combined to ensure returning of query response results in the determined time. The method has the advantage that massive data can be handled.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of searching, and in particular relates to a parallel indexing method supporting real-time biased query of high-dimensional data. Background technique

[0002] High-dimensional data: Refers to data with attributes (features) more than 20 dimensions. Various types of transaction data, social network information, Web documents and usage data, geographic information, document word frequency data, user rating data, multimedia data, etc. present multi-source, massive, heterogeneous (unstructured data models) and high The characteristics of dimensions, that is, their dimensions (attributes), can usually reach hundreds or thousands of dimensions, or even higher, resulting in increasingly complex data that needs to be retrieved in various applications and a rapid expansion of data volume. Biased query: Based on their own preferences and experience in environment interaction, users only care about certain feature di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More