Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Implementing method for multidimensional index structure OBF-Index in Hadoop environment

A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

Active Publication Date: 2018-06-05
YUNNAN UNIV
View PDF12 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali needed to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to achieve data processing of this amount of data.
However, because of the probabilistic data structure of Bloom Filter, the false positive rate will increase as more and more data is inserted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementing method for multidimensional index structure OBF-Index in Hadoop environment
  • Implementing method for multidimensional index structure OBF-Index in Hadoop environment
  • Implementing method for multidimensional index structure OBF-Index in Hadoop environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0029] In order to better illustrate the technical solution of the present invention, firstly, the idea of ​​the present invention is briefly described.

[0030] In Hadoop, the purpose of quickly processing data is achieved through the parallel operation of multiple Mappers and multiple Reducers. Because the data stored on HDFS is generally in the order of GB, TB or more, it is impossible to allocate all the data to one machine for execution when executing a task. Therefore, before executing Map, Hadoop first divides the input data into fixed-size blocks to obtain data fragments (InputSplits), and then each fragment will be assigned to an independent Mapper.

[0031] figure 1 It is a schematic diagram of the original MapReduce process. Such as figure 1As shown, in the original MapReduce process, the Mapper receives data fragments, and the Reducer often copies and processes data from the relevant Mapper at runtime, so the resources of the Reducer node are less than that of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an implementing method for a multidimensional index structure OBF-Index in a Hadoop environment. The implementing method for the multidimensional index structure OBF-Index in the Hadoop environment comprises the following steps: dividing a data set to obtain data fragments, establishing an OBF index object for each data fragment, serializing the OBF index objects into OBF index files and storing the OBF index files, and constructing to obtain OBF-Index; when the data set requires to be used, setting a data set A which requires to be used at first, then separately reading the OBF index files of each data fragment and serializing the OBF index files to obtain OBF index objects, querying whether data in the data set A exist in the data fragments or not by using the OBFindex objects, if the data in the data set A exist in the data fragments, transmitting the data fragments to corresponding Mapper, and if the data in the data set A do not exist in the data fragments, not operating. The invention designs a multi-dimensional index structure OBF-Index, establishment and query can be realized effectively, and false positive probability can be reduced effectively.

Description

technical field [0001] The invention belongs to the technical field of cloud storage, and more specifically relates to a method for realizing a multi-dimensional index structure OBF-Index in a Hadoop environment. Background technique [0002] We are living in an era of big data. Various types of logs on the Internet (such as click logs), content posted by users (such as tweets posted by users on Twitter), and graph data (such as social networks) are massive data. source. In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali had to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to process data of this volume. In recent years, distributed computing, grid computing, and cloud computing technologies have become increasingly mature. As early as 2003 and 2004, Google published two articles to show people their two new technologies GFS (Google File System) and MapReduce in order to d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/2228G06F16/27
Inventor 李劲刘建坤窦奇伟何臻力周维
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products