Unlock instant, AI-driven research and patent intelligence for your innovation.

A data broadcasting system, data broadcasting method and device

A data broadcasting and data technology, applied in the field of big data, can solve problems such as excessive occupation of system network IO memory resources, task failure, etc.

Active Publication Date: 2019-12-24
HUAWEI CLOUD COMPUTING TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the prior art, when a Spark cluster mode is running, when a variable is broadcast, N shares of the broadcast variable will be distributed to the data node according to the number N of task executors (executors) started on the data node, resulting in system network IO (Input / Output) and memory resources are excessively occupied, causing task failure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data broadcasting system, data broadcasting method and device
  • A data broadcasting system, data broadcasting method and device
  • A data broadcasting system, data broadcasting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] In order to make the technical solutions and beneficial effects of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0063] Spark is a memory-based parallel computing framework developed by UC Berkeley's AMP Lab. Spark is different from MapReduce in that Spark's Job intermediate output and results can be stored in memory, so that there is no need to read and write Hadoop Distributed File System (English: Hadoop Distributed File System, referred to as: HDFS), so Spark can better It is suitable for algorithms that require iteration, such as data mining and machine learning.

[0064] The data broadcasting system, data broadcasting method and equipment in the embodiments of the present invention are applied to Spar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a data broadcasting system, a data broadcasting method and equipment, which are used to reduce the occupation of system network IO and memory resources. The data broadcast system in the embodiment of the present invention includes a control node and at least one data node, and at least one task executor runs on each data node; the control node generates broadcast data, and for each data node, a task executor is set as the master Task executor; the main task executor obtains the broadcast data, saves the broadcast data to the off-heap memory on the data node, and sends the address of the off-heap memory to other task executors on the data node; other task executors read from the heap Get the broadcast data from the address of the external memory. In the embodiment of the present invention, on the same data node, only one copy of broadcast data needs to be distributed to the main task executor, thereby reducing the occupation of system network IO and memory resources.

Description

technical field [0001] The present invention relates to the field of big data, in particular to a data broadcasting system, a data broadcasting method and equipment. Background technique [0002] With the advent of the big data era, the memory-based parallel computing platform Spark has widely become a popular framework for processing massive data in the industry. Compared with Hadoop, Spark is more suitable for iterative-based machine learning algorithms and graph algorithms, and the Spark open source community is very Active, the ecosystem based on the Spark parallel framework is also increasingly rich, such as Spark-SQL, Spark-Streaming, etc. [0003] There are several modes of running Spark: for example: local, standalone, yarn, mesos, etc. Resilient Distributed Dataset (English: Resilient Distributed Dataset, referred to as: RDD) is one of the core concepts of Spark, which means read-only, partitionable, fault-tolerant, can be cached in whole or in part in memory, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L12/18H04L29/08
CPCH04L12/185H04L67/10H04L67/60
Inventor 曹莉吕倩楠孙涛
Owner HUAWEI CLOUD COMPUTING TECH CO LTD