Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Storm task expansion scheduling algorithm based on data stream prediction

A scheduling algorithm and task scheduling technology, applied in digital data processing, computing, program startup/switching, etc., can solve problems such as time overhead, insufficient consideration of relevance, increased tuple processing delay, etc., to reduce processing delay , the effect of improving throughput

Active Publication Date: 2017-08-11
CHONGQING UNIV OF POSTS & TELECOMM
View PDF1 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the existing Storm elastic scaling, insufficient consideration is given to the correlation between components in the Topology. At the same time, during elastic scaling, simply add or reduce the parallelism of each component until a better parallelism of each component is obtained. In this process, multiple task scheduling may be performed, and each task scheduling has time overhead, so the processing delay of tuple is increased to a certain extent.
At the same time, the existing scaling adjustments only adjust the parallelism of each component in the topology submitted by the user when the system load changes, and each adjustment takes a certain amount of time, thus reducing the system to a certain extent. throughput

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Storm task expansion scheduling algorithm based on data stream prediction
  • Storm task expansion scheduling algorithm based on data stream prediction
  • Storm task expansion scheduling algorithm based on data stream prediction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0031] Such as image 3 As shown, the implementation of the present invention includes three modules: Topology monitoring module, scaling module and scheduling module. In the topology monitoring module, you can call the Thrift interface of NimbusClient to obtain the monitoring data of each topology running in Storm on the UI and the Ganglia cluster monitoring tool to obtain the data of each node and load, and then we save the data in the database mysql, and scale the adjustment module At the beginning of each cycle, the operation data of each Topology in mysql is saved in the previous cycle, and then the scaling solution of the Topology in this cycle is solved through the above model, and then the scheduling is performed. The present invention will be described in detail below by counting the words in the microblog as an example:

[0032]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Storm task expansion scheduling algorithm based on data stream prediction, and belongs to the field of data exchange networks. Through a monitoring module, the real-time operation data of a Topology task submitted by a user can be obtained, the degree of parallelism of a connected component in Topology under a situation that a component load is met is solved, and then, the degrees of parallelism of all components in the Topology can be solved through iteration. A time series model is used for predicting a data size which needs to be processed by the Topology, the optimal degree of parallelism of a startup component spout in the Topology under the situation is solved, the optimal degree of parallelism of each component in the Topology under a prediction condition is obtained, and scheduling is carried out. In scheduling, an on-line scheduling algorithm is used to reduce inter-node network communication to a largest degree and guarantee the load balance of a cluster. By use of the algorithm, a deficiency that relevance among all components in the Topology is not fully considered is overcome, the deficiency that the optimal degree of parallelism of each component in the Topology submitted by the user can not be quickly and efficiently solved is mad up, and the algorithm has the advantages that change is predicted in advance, handling capacity is improved and handling time delay is lowered.

Description

technical field [0001] The invention belongs to the field of data exchange networks, and relates to a Storm task scaling algorithm based on data flow prediction. Background technique [0002] The popularization and promotion of emerging technologies and application models such as cloud computing, Internet of Things, social media, and mobile Internet have led to a sharp increase in the amount of global data and pushed human society into the era of big data. In the context of big data, data contains rich connotations and values, the timeliness of data is becoming more and more important, the streaming characteristics of data are becoming more and more prominent, and the importance of streaming computing is also becoming more and more prominent. The industry has launched streaming computing frameworks such as S4, Spark, and Storm. Storm is a real-time, distributed and highly fault-tolerant computing system. Storm can process large batches of data, and can also make the proces...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F9/50
CPCG06F9/4843G06F9/5083
Inventor 熊安萍段杭彪蒋溢祝清意蒋亚雄
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products