Implementation method and device for submitting flink job to yarn cluster in application program

A technology of application programs and implementation methods, applied in the field of big data computing, can solve the problems of space occupied by public resources, achieve the effects of reducing dependence, avoiding mutual interference, and improving deployment flexibility

Active Publication Date: 2022-05-13
WUHAN DAMENG DATABASE
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Whenever a new yarn-session of flink is created, the client will first check whether the requested resources (containers and memory) are available, and then upload the jar package and configuration related to flink to hdfs, and submitting the flink job will take up A large amount of network IO is used to upload jar packages, and it will cause common resources to occupy space repeatedly;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementation method and device for submitting flink job to yarn cluster in application program
  • Implementation method and device for submitting flink job to yarn cluster in application program
  • Implementation method and device for submitting flink job to yarn cluster in application program

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] Embodiment 1 of the present invention provides an implementation method for submitting a flink job to a yarn cluster in an application program.

[0037] Flink runs on yarn. It needs to connect to the HDFS file system and the ResourceManager of yarn. It must be able to find the Hadoop configuration file. When the user does not have permission to set the environment variable to find the Hadoop configuration file, the application customizes a Hadoop configuration file path. The Hadoop configuration file is stored in the Hadoop configuration file path, and a resident YarnClient is constructed through the Hadoop configuration file, and the flink cluster interacts with the yarn cluster through the YarnClient;

[0038] The Hadoop configuration file path is the application program working directory, or other paths that can be found are defined by the application program working directory, and the Hadoop configuration file is stored in the path, and the path is designed according...

Embodiment 2

[0199] Embodiment 2 of the present invention provides an implementation method for submitting flink jobs to a yarn cluster in an application program. Compared with embodiment 1, this embodiment 2 presents a more realistic scenario in this solution to solve the Hadoop deployment of flink on yarn. Implementation process for deployment issues of configuration files.

[0200] When flink is running on yarn, it is necessary to construct a YarnClient object to access yarn's ResourceManager, and also to construct a FileSystem object to access HDFS. It is necessary to find the Hadoop configuration file to construct a YarnClient object and a FileSystem object;

[0201] The flink native code checks whether the YARN_CONF_DIR, HADOOP_CONF_DIR, HADOOP_CONF_PATH and other environment variables are in turn; if the YARN_CONF_DIR, HADOOP_CONF_DIR, HADOOP_CONF_PATH and other environment variables are not set, then check the HADOOP_HOME environment variable. For hadoop 2.x, the configuration...

Embodiment 3

[0237] Embodiment 3 of the present invention provides an implementation method for submitting flink jobs to yarn clusters in an application program. Compared with Embodiment 1, Embodiment 3 presents a more realistic scenario in this solution to solve the problem that jar packages are uploaded to HDFS and occupy a large amount. The implementation process of network IO problems.

[0238] When it is necessary to submit a job from flink to run on yarn, first check whether the user has permission to directly upload files to the HDFS system file. In this embodiment, the user does not have permission to directly upload files to the HDFS system file, so first traverse the flink system jar package and the first The original path of the three-party dependent jar package was found to exist in the D drive 110 folder. The user manually uploaded the flink system jar package and the third-party dependent jar package from the D drive 110 folder to the path with the same name as the full pa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of big data computing, discloses an implementation method and device for submitting an flink job to a Yarn cluster in an application program, solves the deployment problem of a Hadoop configuration file and a third-party dependent jar package, and realizes self-defined deployment of the Hadoop configuration file and the third-party dependent jar package by rewriting a part of functions in an flink system class. The deployment flexibility, convenience and operation efficiency of the application programs are improved, the degree of dependence on the environment is reduced, and mutual interference possibly existing between different application programs is avoided.

Description

【Technical field】 [0001] The invention relates to the technical field of big data computing, in particular to an implementation method and device for submitting a flink job to a yarn cluster in an application program. 【Background technique】 [0002] Flink is a high-performance, high-throughput, low-latency stream processing framework. It is not only a stream processing framework, but also unifies batch processing (in Flink, batch processing is a special case of stream processing). This architecture of Flink also better solves the cumbersome component accumulation of the traditional big data architecture, allowing batch flow to perform batch or stream processing without changing the original code, thus becoming a more efficient An increasingly popular big data processing framework. [0003] Flink supports a variety of deployment methods, such as Local, Standalone, Yarn, K8s, etc. Most enterprises now use yarn as a resource manager because of their big data platforms, so for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/445G06F8/60G06F8/71G06F16/182
CPCG06F9/44521G06F16/182G06F8/71G06F8/60
Inventor 高东升梅纲胡高坤
Owner WUHAN DAMENG DATABASE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products