A data broadcasting system, data broadcasting method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data broadcasting and data technology, applied in the field of big data, can solve problems such as excessive occupation of system network IO memory resources, task failure, etc.

Active Publication Date: 2019-12-24

HUAWEI CLOUD COMPUTING TECH CO LTD

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In the prior art, when a Spark cluster mode is running, when a variable is broadcast, N shares of the broadcast variable will be distributed to the data node according to the number N of task executors (executors) started on the data node, resulting in system network IO (Input / Output) and memory resources are excessively occupied, causing task failure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0062] In order to make the technical solutions and beneficial effects of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0063] Spark is a memory-based parallel computing framework developed by UC Berkeley's AMP Lab. Spark is different from MapReduce in that Spark's Job intermediate output and results can be stored in memory, so that there is no need to read and write Hadoop Distributed File System (English: Hadoop Distributed File System, referred to as: HDFS), so Spark can better It is suitable for algorithms that require iteration, such as data mining and machine learning.

[0064] The data broadcasting system, data broadcasting method and equipment in the embodiments of the present invention are applied to Spar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a data broadcasting system, a data broadcasting method and equipment, which are used to reduce the occupation of system network IO and memory resources. The data broadcast system in the embodiment of the present invention includes a control node and at least one data node, and at least one task executor runs on each data node; the control node generates broadcast data, and for each data node, a task executor is set as the master Task executor; the main task executor obtains the broadcast data, saves the broadcast data to the off-heap memory on the data node, and sends the address of the off-heap memory to other task executors on the data node; other task executors read from the heap Get the broadcast data from the address of the external memory. In the embodiment of the present invention, on the same data node, only one copy of broadcast data needs to be distributed to the main task executor, thereby reducing the occupation of system network IO and memory resources.

Description

technical field [0001] The present invention relates to the field of big data, in particular to a data broadcasting system, a data broadcasting method and equipment. Background technique [0002] With the advent of the big data era, the memory-based parallel computing platform Spark has widely become a popular framework for processing massive data in the industry. Compared with Hadoop, Spark is more suitable for iterative-based machine learning algorithms and graph algorithms, and the Spark open source community is very Active, the ecosystem based on the Spark parallel framework is also increasingly rich, such as Spark-SQL, Spark-Streaming, etc. [0003] There are several modes of running Spark: for example: local, standalone, yarn, mesos, etc. Resilient Distributed Dataset (English: Resilient Distributed Dataset, referred to as: RDD) is one of the core concepts of Spark, which means read-only, partitionable, fault-tolerant, can be cached in whole or in part in memory, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): H04L12/18H04L29/08

CPCH04L12/185H04L67/10H04L67/60

Inventor 曹莉吕倩楠孙涛

Owner HUAWEI CLOUD COMPUTING TECH CO LTD

A data broadcasting system, data broadcasting method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology