Distributed machine learning training method, device, electronic equipment and storage medium

A technology of machine learning and training methods, applied in the computer field, can solve problems such as slow computing speed and low utilization of computing resources, and achieve the effects of increasing computing speed, reducing idle waiting time, and improving utilization

Active Publication Date: 2021-09-24
TSINGHUA UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Embodiments of the present invention provide a distributed machine learning training method, device, electronic equipment, and storage medium to solve the problems of slow computing speed and low utilization rate of computing resources in distributed systems in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed machine learning training method, device, electronic equipment and storage medium
  • Distributed machine learning training method, device, electronic equipment and storage medium
  • Distributed machine learning training method, device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0043] figure 1 A schematic flow chart of the distributed machine learning training method provided by the embodiment of the present invention, such as figure 1 As shown, the method includes:

[0044] Step 110, determine the machine learning model to be trained, and the communication priority of the data sent to each computing node, the communicat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the present invention provides a distributed machine learning training method, device, electronic equipment, and storage medium, wherein the method includes: determining the machine learning model to be trained, and the communication priority of the data sent to each computing node, said The communication priority of the data sent to each computing node is determined based on the model structure of the machine learning model and / or the processing speed of each computing node running the corresponding training task in the machine learning model; based on the data sent to The data communication priority of each computing node is to transmit the data flow generated by the machine learning model during the training process, so that each computing node can train the machine learning model based on the received data flow. The method, device, electronic equipment, and storage medium provided by the embodiments of the present invention improve the computing speed of machine learning model training and improve the utilization rate of computing resources in a distributed system.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a distributed machine learning training method, device, electronic equipment and storage medium. Background technique [0002] Currently, iterative synchronization applications represented by distributed machine learning training tasks are very popular in data centers. In iterative synchronization applications, multiple computing nodes in a distributed system perform computing tasks iteratively, and perform global synchronization on their respective computing results in each iteration. Only after the global synchronization ends, these computing nodes can start downloading. One round of iterative calculation. [0003] However, as the complexity of applications increases, for example, when the number of parameters of a machine learning model reaches hundreds of millions, the computing speed of distributed systems is slow and the utilization of computing resources is low....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/08G06N20/00G06N3/04G06N3/08
CPCH04L67/1008G06N20/00G06N3/08G06N3/045
Inventor 李丹王帅
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products