Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for Hadoop program testing

A program testing and program technology, applied in the field of HADOOP program testing, can solve problems such as increasing the execution time, and achieve the effect of speeding up the test execution speed and shortening the test execution time.

Active Publication Date: 2017-12-19
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In addition, since the HADOOP program is implemented in Java, the Java Virtual Machine (JVM) will be started at runtime, which will also increase the execution time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for Hadoop program testing
  • A method and system for Hadoop program testing
  • A method and system for Hadoop program testing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] figure 1 The method flow chart of the HADOOP program test provided for the first embodiment of the present invention, such as figure 1 As shown, the method includes the following steps:

[0043] Step 101: Run the HADOOP program to be tested.

[0044] Step 102: Determine the type of the call to be executed. If the call to the SHELL interface of the remote HDFS is executed, step 103 is executed; if the call to the remote MAP / REDUCE computing interface is executed, step 104 is executed.

[0045]The HADOOP program mainly includes two kinds of calls, namely the call to the SHELL interface of the remote HDFS and the call to the remote MAP / REDUCE computing interface. The calls to the SHELL interface of remote HDFS are mainly operations on remote files, such as read, write, upload, download, display, copy, move, delete, etc. The call to the remote MAP / REDUCE computing interface is mainly a MAP / REDUCE computing task started in a streaming mode.

[0046] Step 103: Convert the...

Embodiment 2

[0052] figure 2 A flowchart of the specific method of the above step 103 provided in Embodiment 2 of the present invention, such as figure 2 shown, including the following steps:

[0053] Step 201 : according to the preset mapping relationship between the remote command name and the local command name, the command for invoking the SHELL interface of the remote HDFS is converted into a command for invoking the SHELL interface of the local FS.

[0054] The mapping relationship between the remote command name and the local command name is pre-configured, so that the calling command to the SHELL interface of the remote HDFS can be converted into a calling command to the SHELL interface of the local FS. The mapping relationship can be shown in Table 1 as an example.

[0055] Table 1

[0056]

[0057] Step 202: Based on a preset path mapping rule, convert the HDFS path in the calling command to the SHELL interface of the remote HDFS into a local FS path.

[0058] Pre-configu...

Embodiment 3

[0067] image 3 A flowchart of the specific method of the above-mentioned step 104 provided in Embodiment 3 of the present invention, such as image 3 shown, including the following steps:

[0068] Step 301: Based on a preset path mapping rule, convert the input and output paths involved in the invocation of the remote MAP / REDUCE computing interface from the HDFS path to the local FS path.

[0069] The input and output paths involved in the invocation of the remote MAP / REDUCE computing interface are HDFS paths. The basis of job execution localization is path localization. The path mapping rules used here are the same as those used in the second embodiment. It is not repeated here.

[0070] Step 302 : Use the local job runner (LocalJobRunner) to replace the job tracker (JobTracker).

[0071] The HADOOP system uses the job client to submit the job to the JobTracker, and then the JobTracker divides the job into computing tasks and assigns them to multiple TaskTrackers for para...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and system for HADOOP program testing, wherein in the process of running the HADOOP program to be tested, if a call is made to the SHELL interface of the remote HADOOP distributed file system (HDFS), the SHELL interface of the remote HDFS will be called The calling command of the interface is converted into a calling command of the SHELL interface of the local file system (FS), and the HDFS path is converted into a local FS path; the converted command is executed to obtain the execution result. Or, if running to a call to a remote distributed (MAP / REDUCE) computing interface, convert the input and output paths from the HDFS path to the local FS path, use the local job runner to replace the job tracker, and after entering the execution path, execute MAP execution script and REDUCE execution script to obtain the execution result. The test execution time can be shortened by means of the invention.

Description

【Technical field】 [0001] The invention relates to the technical field of computer applications, in particular to a method and system for testing HADOOP programs. 【Background technique】 [0002] HADOOP is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution, and make full use of the power of clusters for high-speed computing and storage. [0003] During the testing process of the HADOOP program, it needs to be performed on the built distributed file system (HDFS, Hadoop Distributed File system) and HADOOP computing cluster. When accessing HDFS and submitting distributed computing jobs (MAP / REDUCE JOB), it is necessary to perform Remote data access and transfer, processing JOB submission and initialization, task assignment and execution, and other remote work, these remote data access and scheduling tasks tend to consume a lot of time. After the test, 1,000 to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36
Inventor 沙安澜
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products