Method for improving MR task operation efficiency

A task operation and efficiency technology, applied in structured data retrieval, instruments, calculations, etc., can solve problems such as consumption, achieve the effects of improving execution efficiency, reducing complexity, and simplifying task processing

Inactive Publication Date: 2019-12-13
山东健康医疗大数据有限公司
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The technical problem to be solved by the present invention is: the MR task itself is to perform off-line analysis and processing on the data of a large amount of data, and the MR principle itself is to read the file data line by line, perform partition sorting, and integrate the data in the form of K-V, which is based on the disk. So it consumes a lot of IO

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following specific embodiments further illustrate the present invention:

[0028] A method for improving the operating efficiency of MR tasks, the implementation process of the method includes:

[0029] 1. The basic dimension class business in the MR task of each business in the project is encapsulated into a top-level dimension class according to business requirements. When the MR task calls a dimension class, it directly calls the top-level dimension class of the business to reduce business processing and code writing complexity and improve work efficiency.

[0030] 2. After the MR task completes the analysis and sorting of the data on the Reducer side, the step of writing to hdfs is discarded, and the relational database is written directly through the interface, so as to simplify the entire task processing process and improve the execution efficiency of the MR task.

[0031] 3. When the data analyzed and organized by the MR task is written into the relational d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for improving MR task operation efficiency, and the method comprises the steps: packaging basic dimension type services in an MR task of each service in a project intoa top-level dimension type according to the service demands, and directly calling the top-level dimension type of the service when the MR task calls the dimension type. According to the method, through packaging the basic dimension class, the complexity of service processing and code writing is reduced, and the working efficiency is improved. By discarding the hdfs writing process, the whole taskprocessing process is simplified, and the MR task execution efficiency is improved. According to the method, analysis data is written into a relational database by executing sql statements in batches, network IO is reduced, and MR task efficiency is improved. According to the method, a proper data compression format is selected, network IO is reduced, and MR task efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of big data processing in the hadoop ecosystem, in particular to a method for improving the operating efficiency of MR tasks. Background technique [0002] Data has penetrated into every industry and business function area today and has become an important factor of production. People's mining and application of massive data heralds the arrival of a new wave of productivity growth and consumer surplus. [0003] Big data is often used to describe the large amount of unstructured and semi-structured data created by a company that would take too much time and money to download to a relational database for analysis. [0004] Now the society’s daily incremental data has reached terabytes, or even reached the petabyte level. Compared with many companies with limited resources, processing tens of gigabytes of data will increase the cost. For this reason, many companies optimize each analysis tool from within the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458G06F16/2453G06F16/242G06F16/25
CPCG06F16/2471G06F16/2453G06F16/2433G06F16/252
Inventor 刁彬朔
Owner 山东健康医疗大数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products