Spark Streaming receiver dynamic configuration method and device in big data platform

A big data platform and dynamic configuration technology, applied in the field of big data processing, to achieve the effect of improving resource utilization

Inactive Publication Date: 2018-09-14
SHANDONG UNIV
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the deficiencies in the prior art and solving the problem of dynamic configuration of receiver parallelism in the stream processing system based on the DStream model in the prior art, the present invention proposes a dynamic configuration method and device for Spark Streaming receivers in a big data platform, based on The simulated annealing algorithm of delay and throughput automatically determines the parallelism of the receiver, and dynamically adjusts it according to the system environment and load, effectively balancing the system throughput and system processing capacity, and improving the utilization of system resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark Streaming receiver dynamic configuration method and device in big data platform
  • Spark Streaming receiver dynamic configuration method and device in big data platform
  • Spark Streaming receiver dynamic configuration method and device in big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] The purpose of Embodiment 1 is to provide a method for dynamically configuring Spark Streaming receivers in a big data platform.

[0058] In order to achieve the above object, the present invention adopts the following technical scheme:

[0059] like Figure 1-2 As shown, a method for dynamically configuring Spark Streaming receivers in a big data platform, the specific steps include:

[0060] A. Determine the Spark application, runtime, and input dataset;

[0061] B. Improve the execution framework of spark streaming, and then propose a dynamic configuration strategy for the receiver receiver; the parallelism of the receiver receiver is configured based on manual experience. The execution framework of streaming is improved, and a dynamic receiver configuration strategy is proposed.

[0062] The execution framework for improving spark streaming in step B includes 4 steps:

[0063] B1. Change the number of receivers set by the original manual experience value to gene...

Embodiment 2

[0112] The purpose of this Example 3 is to conduct experimental verification based on the method in Example 1.

[0113] The experimental environment uses Spark1.6+hadoop2.2, and the program is wordCount, which is compiled by Maven and deployed to the experimental cluster. In this embodiment, 11 virtual machines (VMs) are deployed on a real Spark cluster. Each virtual machine has 8 2GHz cores, 8GB RAM and 500GB hard disk. One virtual machine is used as ResourceManager and NameNode, and the remaining 10 virtual machines As a worker, each worker is configured with 16 virtual memory, 7GB memory (1GB is required for the background process) and a 500GB hard disk. This embodiment has realized independent resource management and scheduling. In order to ensure the reliability of data, this embodiment adopts HDFS (Hadoop Distributed File System) to obtain permanent results at the bottom of Spark. HDFS block size is set to 64MB and replication level is set to 3. Red Hat 6.3 server vers...

Embodiment 3

[0130] The purpose of Embodiment 3 is to provide a computer-readable storage medium.

[0131] In order to achieve the above object, the present invention adopts the following technical scheme:

[0132] A computer-readable storage medium, in which a plurality of instructions are stored, and the instructions are adapted to be loaded by a processor of a terminal device and perform the following processing:

[0133] According to the system throughput and data processing delay, determine the nonlinear optimization objective function based on the balance between system throughput and data processing delay;

[0134] Solve the nonlinear optimization objective function to obtain the approximate solution with the optimal number of receivers as the number of receivers, and send the number of receivers to the network receivers;

[0135] The network receiver allocates receivers according to the number of received receivers and cluster conditions, and completes the dynamic configuration of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Spark Streaming receiver dynamic configuration method and device in a big data platform. The method comprises the following steps: determining a nonlinear optimized object function based on the balance of system throughput and data processing delay according to the system throughput and the data processing delay; solving the nonlinear optimized object function to obtain an approximate solution with the optimal receiver number as the receiver number, and sending the receiver number to a network receiver, wherein the network receiver distributes the receivers accordingto the received receiver number and cluster data so as to accomplish the parallelism dynamic configuration of the receivers.

Description

technical field [0001] The invention belongs to the technical field of big data processing, and in particular relates to a method and device for dynamically configuring a Spark Streaming receiver in a big data platform. Background technique [0002] In recent years, "big data" real-time processing technology has increasingly penetrated into various fields of economic development, social progress and human life, and has become an important active factor in productivity. At present, the traditional batch processing method will generate a large number of read and write I / O during the calculation process, which affects the processing performance of streaming data. The traditional distributed computing method based on batch processing can no longer adapt to real-time processing scenarios. Therefore, streaming Processing technology came into being. Distributed stream processing systems will involve dozens or even hundreds of nodes in actual production and application. Due to the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/927H04L12/823G06F9/50H04L12/803H04L47/80H04L47/32
CPCH04L47/32H04L47/80G06F9/5027G06F9/5083H04L47/125
Inventor 史玉良王新军陈志勇胡静臧淑娟
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products