Implementation method of multidimensional index structure obf-index in hadoop environment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

Active Publication Date: 2021-06-04

YUNNAN UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali needed to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to achieve data processing of this amount of data.

However, because of the probabilistic data structure of Bloom Filter, the false positive rate will increase as more and more data is inserted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0029] In order to better illustrate the technical solution of the present invention, firstly, the idea of the present invention is briefly described.

[0030] In Hadoop, the purpose of quickly processing data is achieved through the parallel operation of multiple Mappers and multiple Reducers. Because the data stored on HDFS is generally in the order of GB, TB or more, it is impossible to allocate all the data to one machine for execution when executing a task. Therefore, before executing Map, Hadoop first divides the input data into fixed-size blocks to obtain data fragments (InputSplits), and then each fragment will be assigned to an independent Mapper.

[0031] figure 1 It is a schematic diagram of the original MapReduce process. Such as figure 1As shown, in the original MapReduce process, the Mapper receives data fragments, and the Reducer often copies and processes data from the relevant Mapper at runtime, so the resources of the Reducer node are less than that of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for realizing a multi-dimensional index structure OBF-Index under Hadoop environment, which divides a data set to obtain data slices, creates an OBF index object for each data slice and serializes it into an OBF index file for storage , to construct the OBF-Index; when the data set needs to be used, first set the data set A to be used, then read the OBF index file of each data fragment and deserialize it to obtain the OBF index object, and use the OBF index object to query Whether the data in data set A exists in the data shard, if yes, pass the data shard to the corresponding Mapper, otherwise do nothing. The present invention designs a multi-dimensional index structure OBF-Index, which can efficiently implement creation and query, and can effectively reduce the false positive rate.

Description

technical field [0001] The invention belongs to the technical field of cloud storage, and more specifically relates to a method for realizing a multi-dimensional index structure OBF-Index in a Hadoop environment. Background technique [0002] We are living in an era of big data. Various types of logs on the Internet (such as click logs), content posted by users (such as tweets posted by users on Twitter), and graph data (such as social networks) are massive data. source. In 2008, Google's daily data volume exceeded 20PB. In 2016, Ali had to process more than 100PB of data per day, and had more than 1 million big data tasks per day. It was impossible to use a single machine to process data of this volume. In recent years, distributed computing, grid computing, and cloud computing technologies have become increasingly mature. As early as 2003 and 2004, Google published two articles to show people their two new technologies GFS (Google File System) and MapReduce in order to d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/22G06F16/27

CPCG06F16/2228G06F16/27

Inventor李劲刘建坤窦奇伟何臻力周维

OwnerYUNNAN UNIV

Implementation method of multidimensional index structure obf-index in hadoop environment

What is AI technical title? AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document. A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of obf-index and index structure, applied in the field of cloud storage, which can solve problems such as high false positive rate

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology