Method and apparatus for optimizing data access, method and apparatus for optimizing data storage

A technology for optimizing data and data, applied in the field of data processing, can solve the problems of wasting scheduling time and system resources, reducing the execution speed of data processing, etc., to achieve the effect of improving data processing efficiency

Active Publication Date: 2012-10-10
HUAWEI CLOUD COMPUTING TECH CO LTD
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] When HBASE is used as the data source of MapReduce, the data range accessed by MapReduce is defined according to the data query range in a range query object. In order to ensure that the recorded data that meets the requirements are not missed, the data range specified by the range query object can only be expanded. resulting in too many partitions being covered and partitions containing a large amount of invalid data
In addition to reading qualified records in the partition, the MapReduce program must also read a large number of invalid records for comparison and discarding, resulting in a large number of invalid operations and seriously reducing the execution speed of data processing
[0014] When HBASE is used as the data destination of MapReduce, if the range of data storage partitions is limited, multiple useless Reduce processes will be generated, wasting scheduling time and system resources, and reducing the execution speed of data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
  • Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
  • Method and apparatus for optimizing data access, method and apparatus for optimizing data storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0058] Firstly, a brief description will be given below of the existing MapReduce operation process.

[0059] Such as figure 1 As shown, MapReduce includes three independent entities: user program, master controller, and slave processor. Among them, the master controller is used to coordinate the operation of the job and assign tasks to the slave processors; the slave processors process the Map tasks and Reduce tasks after the jobs run.

[0060] When the user program calls the MapReduce function, it will cause the following operations:

[0061] 1) The MapReduce function library in the user program first divides the input file into M blocks.

[0062] 2) The main controller gets the input partition information an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided are a method and an apparatus for optimizing data access, and a method and an apparatus for optimizing data storage. The method for optimizing data access comprises that: a host controller receives a request that a user accesses a data table in HBASE (Hadoop Database), wherein the request carries information of data input ranges, and the data input ranges comprise a plurality of data input ranges; input partitioning information is determined according to partitioning information of the data table and the data input range information; the number of Map tasks is determined on the basis of the input partitioning information; data in the data table, which is read from a processor, is distributed according to the number of the Map tasks; and the data read from the processor is returned to the user.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a method and device for optimizing data access, and a method and device for optimizing data storage. Background technique [0002] As a large-scale data parallel processing method, MapReduce has been widely used in large-scale data analysis. HBASE (Hadoop Database) is a high-reliability, high-performance, column-oriented, and scalable distributed storage system. HBASE can be used as the data source and data destination of MapReduce, so that MapReduce can process the data stored in HBASE or output Data is kept in HBASE. [0003] When HBASE is used as the data source of MapReduce, the tables accessed by MapReduce and the range of data accessed are defined through the table name and the range query object. The range query object defines the data query range by specifying the start key value and the end key value. [0004] When the user program calls the MapReduce f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F12/02G06F16/2471
Inventor 智伟赵智峰周帅锋
Owner HUAWEI CLOUD COMPUTING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products