Implementing method for multidimensional index structure OBF-Index in Hadoop environment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

Active Publication Date: 2018-06-05

YUNNAN UNIV

View PDF12 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali needed to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to achieve data processing of this amount of data.

However, because of the probabilistic data structure of Bloom Filter, the false positive rate will increase as more and more data is inserted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0029] In order to better illustrate the technical solution of the present invention, firstly, the idea of the present invention is briefly described.

[0030] In Hadoop, the purpose of quickly processing data is achieved through the parallel operation of multiple Mappers and multiple Reducers. Because the data stored on HDFS is generally in the order of GB, TB or more, it is impossible to allocate all the data to one machine for execution when executing a task. Therefore, before executing Map, Hadoop first divides the input data into fixed-size blocks to obtain data fragments (InputSplits), and then each fragment will be assigned to an independent Mapper.

[0031] figure 1 It is a schematic diagram of the original MapReduce process. Such as figure 1As shown, in the original MapReduce process, the Mapper receives data fragments, and the Reducer often copies and processes data from the relevant Mapper at runtime, so the resources of the Reducer node are less than that of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an implementing method for a multidimensional index structure OBF-Index in a Hadoop environment. The implementing method for the multidimensional index structure OBF-Index in the Hadoop environment comprises the following steps: dividing a data set to obtain data fragments, establishing an OBF index object for each data fragment, serializing the OBF index objects into OBF index files and storing the OBF index files, and constructing to obtain OBF-Index; when the data set requires to be used, setting a data set A which requires to be used at first, then separately reading the OBF index files of each data fragment and serializing the OBF index files to obtain OBF index objects, querying whether data in the data set A exist in the data fragments or not by using the OBFindex objects, if the data in the data set A exist in the data fragments, transmitting the data fragments to corresponding Mapper, and if the data in the data set A do not exist in the data fragments, not operating. The invention designs a multi-dimensional index structure OBF-Index, establishment and query can be realized effectively, and false positive probability can be reduced effectively.

Description

technical field [0001] The invention belongs to the technical field of cloud storage, and more specifically relates to a method for realizing a multi-dimensional index structure OBF-Index in a Hadoop environment. Background technique [0002] We are living in an era of big data. Various types of logs on the Internet (such as click logs), content posted by users (such as tweets posted by users on Twitter), and graph data (such as social networks) are massive data. source. In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali had to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to process data of this volume. In recent years, distributed computing, grid computing, and cloud computing technologies have become increasingly mature. As early as 2003 and 2004, Google published two articles to show people their two new technologies GFS (Google File System) and MapReduce in order to d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/2228G06F16/27

Inventor李劲刘建坤窦奇伟何臻力周维

OwnerYUNNAN UNIV

Implementing method for multidimensional index structure OBF-Index in Hadoop environment

What is AI technical title? AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document. A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology