HBase-based big data storage and retrieval method and system

A big data storage and source data technology, applied in the field of big data storage, can solve the problems that the program is no longer applicable, it is difficult to implement, and the storage methods of data items are different.

Inactive Publication Date: 2015-09-16
WUHAN UNIV
View PDF4 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the source data file structure, field names, and types are different, and the storage methods of data items are also different, it is difficult to implement such data conversion and import using existing HBase import tools such as bulk loader
Usually to solve similar problems, the programmer must write a specific program to complete the data extraction and import. However, this program is generally only applicable to the import of a specific source data file to the destination table. If the source data file type or the destination table changes then the program no longer applies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HBase-based big data storage and retrieval method and system
  • HBase-based big data storage and retrieval method and system
  • HBase-based big data storage and retrieval method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0071] A kind of HBase-based big data storage and retrieval method provided by the invention comprises the following steps:

[0072] Step 1: Create a source data file description object, and use a mapping table to store the corresponding relationship between the fields of the source data file and the data columns of the target HBase table; the specific implementation includes the following sub-steps:

[0073] Step 1.1: Define the data file object field mapping table;

[0074] The mapping table is used to define the applicable object for HBase big data conver...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an HBase-based big data storage and retrieval method and system. A line key is generated through a defined RowKey expression with an HBase Thrift client based on a data file field mapping table, and data stored in lines is imported into an HBase database. On the premise of keeping consistence, multi-feature values of data objects are added according to a plurality of combination ways to form the line key; the multi-feature values and ordinary column value data construct HBase data lines; the HBase data lines are stored in a plurality of HBase data tables according to different line key construction ways; a fuzzy result set can be rapidly obtained according to matching of certain feature values in the line key during multi-feature value data retrieval; and filter filtration is further performed on the fuzzy result set to obtain a final accurate result set. A research result can be suitable for a big data conversion-storage process from different types of data files to a destination HBase database; high universality is achieved; line key storage data is formed according to a multi-feature value combination way; a rapid data retrieval interface can be provided; and the aim of rapidly retrieving is fulfilled.

Description

technical field [0001] The invention belongs to the technical field of big data storage, and relates to a method and system for distributed storage, transformation and retrieval of big data. Its goal is to realize the transformation of data files stored in rows into HBase distributed database storage and to quickly retrieve and access the big data in HBase. Background technique [0002] There are few related literatures on the conversion and storage of data files stored by row to HBase. There are usually three ways to migrate and integrate different types of data into HBase: one is to use the HBase interface to write a special program to realize data docking; the other is to use bulk load, etc. The tool completes data import; the third is to write a MapReduce program to import data to HBase. These three methods have problems such as poor versatility, limited use environment, and high operational complexity. It is necessary to develop a general data extraction and transforma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2219G06F16/2282G06F16/242G06F16/25
Inventor 徐爱萍吴笛徐武平
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products