Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and system for accessing big data based on memory database and hbase

A database and big data technology, applied in the field of data access, can solve problems such as the inability to satisfy rowkey as a keyword for flexible query, and achieve the effect of fast access and fast access speed

Active Publication Date: 2019-12-13
北京思特奇信息技术股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Hbase is a NOSQL database; the data in Hbase can be easily retrieved according to the range of rowkey or rowkey, but it cannot meet the needs of flexible queries that do not use rowkey as a keyword

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for accessing big data based on memory database and hbase
  • A method and system for accessing big data based on memory database and hbase
  • A method and system for accessing big data based on memory database and hbase

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Embodiment 1. A method for accessing big data based on an in-memory database and Hbase. Combine below figure 1 The method provided in this embodiment will be described.

[0021] see figure 1 , the method provided by this embodiment includes: S1. Read a plurality of source files to be processed, and perform task processing on each of the source files, wherein one task processing includes multiple processing steps;

[0022] S2. After each processing step is performed for each source file, store the file processing status of each source file in the memory database in a first preset manner;

[0023] S3. Store the file data after each processing step of each source file in Hbase in a second preset manner.

[0024] Wherein, in this embodiment, a process number is configured for each task processing, and the step S2 specifically includes:

[0025] After each processing step is performed on each source file, the file processing status of each source file and the processing ...

Embodiment 2

[0030] In order to deepen the further understanding of the method for accessing big data based on the memory database and Hbase provided by the present invention, a specific example is used below to illustrate.

[0031] see figure 2 , first, develop a program based on Hbase storage (for the convenience of description, hereinafter referred to as work order program), to record and save the files and breakpoint file records after each processing step in the task processing process. For example, take the preprocessing and deduplication processing of source files as an example: After preprocessing each file, you need to write the preprocessed file into Hbase through the work order program, and record the data in the memory database. The processing status of the file (for example, the preprocessing has been completed, but the deduplication processing has not started); the work order program reads the preprocessed file from Hbase and puts it into the deduplication processing entry, ...

Embodiment 3

[0045] Embodiment 3. A system for accessing big data based on memory data and Hbase. Combine below image 3 The system provided by this embodiment is described.

[0046] see image 3 , the system provided by this embodiment includes a file reading module 31 , a configuration module 32 , a task processing module 33 , a table creation module 34 , a first storage module 35 , a second storage module 36 and a table cleaning module 37 .

[0047] Specifically, the file reading module 31 is configured to read multiple source files to be processed.

[0048] The task processing module 33 is configured to perform task processing on each of the source files, wherein one task processing includes multiple processing steps.

[0049] The first storage module 35 is configured to store the file processing status of each source file in the internal memory database in a first preset manner after each processing step is performed on each source file.

[0050] The second storage module 36 is co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for accessing big data based on a memory database and an Hbase. The method comprises the steps of S1, reading multiple to-be-processed source files; carrying out task processing on each source file, wherein the task processing comprises multiple processing steps; S2, storing the file processing status of each source file in the memory database through adoption of a first preset mode after each processing step is carried out on each source file; and S3, storing the file data obtained after each processing step is carried out on each source file in the Hbase through adoption of a second preset mode. According to the method and the system, intermediate data files in the task processing process are stored in the Hbase, the file processing statuses obtained after each processing step is carried out on each file in the memory database, the advantages that the big data can be stored in the Hbase and the access speed of the memory data is fast are utilized fully, and the fast access of the data is realized.

Description

technical field [0001] The invention relates to the technical field of data access, in particular to a method and system for accessing big data based on an in-memory database and Hbase. Background technique [0002] A distributed memory database is an in-memory database in which all data is stored in memory and can take advantage of the super speed of memory access. Data reliability is guaranteed through full data files (checkpoint) and redo logs. Support sql to access data flexibly. At the same time, the distributed memory database is distributed, deployed on multiple nodes of the network, and provides a unified access interface to the outside world. [0003] Hbase is a NOSQL database; the data in Hbase can be easily retrieved according to rowkey or the range of rowkey, but it cannot meet the needs of flexible queries that do not use rowkey as a keyword. Contents of the invention [0004] The technical problem to be solved by the present invention is to provide a metho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27
CPCG06F16/27
Inventor 李晓静
Owner 北京思特奇信息技术股份有限公司