Method and device for verifying processed data accuracy under MapReduce environment

A technology for processing data and correctness, applied in the field of computer computing, can solve problems such as spending a few hours or even a day or two, a lot of labor costs, and a large amount of real data, so as to shorten the time of data processing and reduce the realization of Time, avoid labor cost effect

Active Publication Date: 2015-09-16
阿里巴巴(成都)软件技术有限公司
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] When using the method of writing test cases to verify the correctness of data processing in the MapReduce environment, it is necessary to manually write test cases. In order to ensure that the written test cases can reflect data processing errors when the operation results in any MapReduce environment should change, it is necessary to ensure that The written test cases should be as comprehensive as possible, which requires a lot of labor costs; at the same time, due to the complexity of the operating environment of MapReduce, errors that may occur during the operation cannot be completely predicted, so manual writing of test cases is usually not guaranteed to be successful. Reflects errors in all data processing, so processing data for test cases may not be reliable
When the real data method is used to verify the correctness of data processed in the MapReduce environment, the real data usually has a huge amount of data. Generally, the real data can reach between hundreds of GB (gigabytes) and several TB (terabytes). Therefore, whether it is to perform data processing on the real data or to compare the results of data processing, it will take several hours or even a day or two.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for verifying processed data accuracy under MapReduce environment
  • Method and device for verifying processed data accuracy under MapReduce environment
  • Method and device for verifying processed data accuracy under MapReduce environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described The embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0054] In the process of actually implementing MapReduce, general developers only need to implement their own Map and Reduce function logic, and then submit it to the MapReduce operating environment. The functional logic of the MapReduce can be understood as specific data processing for each actual problem to be solve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for verifying processed data accuracy under a MapReduce environment. The method comprises the steps that a hijacking code is added to a MapReduce source code to generate a data processing program with the hijacking code; the data processing program with the hijacking code runs to process input data, output data are obtained, and triple data are formed; the triple data with identical coverage information are removed, and a monitoring triple set is formed; when processed data accuracy needs to be verified, the input data in the monitoring triple set are returned to the data processing program with the hijacking code to be processed again, output data are obtained, and an output data set is formed; whether each piece of data in the output data set is accurate or not is verified. According to the method for verifying processed data accuracy under the MapReduce environment, the manual cost and implementation time for verifying processed data accuracy can be reduced. The invention further provides a device for verifying processed data accuracy under the MapReduce environment. The method can be achieved.

Description

technical field [0001] The invention relates to the field of computer computing, in particular to a method and a device for verifying the correctness of processing data in a MapReduce environment. Background technique [0002] With the development of computer technology, the amount of data that the computer needs to process is also increasing, and a single computer can no longer process some large-scale data. scale data. Cloud computing is the organization and management of equipment through the use of well-designed system architecture, which can provide very powerful computing power. MapReduce is a programming model that is usually used to run large-scale data sets in parallel and distributed in large-scale clusters. The large-scale data sets generally refer to data sets larger than 1TB (trillion bytes). [0003] The process of MapReduce's parallel processing of the data set may specifically include: decomposing the data set into multiple data blocks according to the numb...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王立
Owner 阿里巴巴(成都)软件技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products