Model parameter training method and device, server and storage medium

A technology of model parameters and training methods, applied in the information field, can solve the problems of high communication cost and network overhead of gradient transmission, achieve the effect of basically lossless convergence speed and quantization results, reduce communication cost and network overhead, and improve operating efficiency

Active Publication Date: 2018-09-04
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present application provides a model parameter training method, device, server and storage medium, which can be used to s...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model parameter training method and device, server and storage medium
  • Model parameter training method and device, server and storage medium
  • Model parameter training method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0028] figure 1 It is a schematic structural diagram of a model training system provided by the embodiment of this application, see figure 1 , the model training system 100 includes a main computing node 12 and N sub-computing nodes 14, where N is a positive integer. The main computing node 12 is connected to the N sub-computing nodes 14 through a network. The main computing node 12 or the sub-computing node 14 may be a server, or may be a computer or a device with a data computing function, etc., and the embodiment of the present application does not limit the main computing node 12 or the sub-computing node 14 .

[0029] Such as figure 2 As shown, the interaction process between the main computing node 12 and N sub-computin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a model parameter training method and device, a server and a storage medium, which belongs to the technical field of information. The method comprises the steps that an initialparameter value and a sample set of a model parameter of a target model are acquired; the first gradient of the model parameter is calculated according to the initial parameter value and the sample set; iterative quantization processing is carried out on the first gradient to acquire a quantized second gradient, wherein the iterative quantization processing is quantization processing carried outbased on an error cumulative value corresponding to the t-1-th iteration round in the t-th iteration round, and the error cumulative value is a quantization error cumulative value calculated based ona preset time attenuation coefficient; and the quantized second gradient is transmitted to a primary computing node, wherein the quantized second gradient is used to instruct the primary computing node to update the initial parameter value according to the quantized second gradient to acquire an updated parameter value. According to the embodiment of the invention, a quantization error correctionmethod is used to quantize and compress the first gradient of the model parameter, which reduces the communication cost and network overhead of gradient transmission.

Description

technical field [0001] The present application relates to the field of information technology, in particular to a model parameter training method, device, server and storage medium. Background technique [0002] DistBelief is an artificial intelligence deep learning framework that can be used to train large-scale neural network models. It has been widely used in many fields such as text processing and image recognition. DistBelief provides a distributed training mode: Stochastic Gradient Descent (SGD), which defines a main computing node and N sub-computing nodes, each sub-computing node is used for training to generate a copy of the model, the main computing node It is used to share model parameters for N child computing nodes. [0003] Before the training starts, the main computing node sends the initial parameter values ​​of the model parameters to each sub-computing node, and divides the training data set into multiple sample sets, which are distributed to N sub-computi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 吴家祥黄维东黄俊洲
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products