Data backup method, device, equipment and computer readable storage medium

A data backup, computer program technology, applied in computing, electrical digital data processing, error detection of redundant data in operations, etc., can solve the problems of stagnant training tasks, low model training efficiency, loss of results, etc.

Inactive Publication Date: 2019-11-01
GUANGDONG INSPUR BIG DATA RES CO LTD
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] After the GPU device is lost and the GPU driver is unavailable, if there is a running training task, the training task will be stagnated and the task will be interrupted. The user needs to restart the host computer, reinstall the driver, etc. to re-train the model, resulting in several days of results. Lost, resulting in a large waste of time and low model training efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data backup method, device, equipment and computer readable storage medium
  • Data backup method, device, equipment and computer readable storage medium
  • Data backup method, device, equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] see figure 1 , figure 1 It is an implementation flowchart of a data backup method in an embodiment of the present invention, and the method may include the following steps:

[0048] S101: Obtain device state information of a target GPU device currently used for model training.

[0049] In the process of using the target GPU device for deep learning model training, the host computer can be used to monitor the GPU device to obtain the device status information of the target GPU device currently used for model training. The device status information may include information such as temperature, memory, power consumption, and utilization of the target GPU device.

[0050] The target GPU device is any GPU device in the process of model training.

[0051] S102: Determine whether the target GPU device meets the warning condition according to the device status information, if yes, perform step S103, if not, do not process.

[0052] The warning condition for warning the targe...

Embodiment 2

[0061] see figure 2 , figure 2 It is another implementation flowchart of the data backup method in the embodiment of the present invention, and the method may include the following steps:

[0062] S201: Obtain information about each warning parameter of a target GPU device.

[0063] In the process of using the target GPU device for model training, the early warning parameter information of the target GPU device can be obtained, such as the temperature, memory, power consumption and utilization of the target GPU device and other early warning parameter information.

[0064] S202: Calculate an early warning value according to each early warning parameter information and corresponding preset weights.

[0065] You can pre-set the calculation formula for the warning value of the target GPU device. Following the above example, when the warning parameter information includes temperature, memory, power consumption, and utilization rate, you can normalize the warning parameter info...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data backup method. The method comprises the following steps of obtaining device state information of a target GPU device currently used for model training; judging whether the GPU equipment meets an early warning condition or not according to the equipment state information; if yes, performing storage operation on the model training data; and when it is detected that thedriving of the target GPU equipment fails, sending the model training data to standby GPU equipment of the target GPU equipment in the GPU cluster. By applying the technical scheme provided by the embodiment of the invention, the time is greatly saved, and the training efficiency is improved. The invention further discloses a data backup device and equipment and a storage medium, which have corresponding technical effects.

Description

technical field [0001] The present invention relates to the field of computer application technology, in particular to a data backup method, device, equipment and computer-readable storage medium. Background technique [0002] In the process of using the GPU device on the host for model training, different models or different model iteration parameters will cause the temperature of the GPU device to rise and the usage rate to be overloaded, or when the ambient temperature of the computer room where the host is located is high, the GPU The GPU device is lost due to the placement conditions of the device. Install the GPU driver on the host, and view the detailed information of the GPU driver through nvidia-smi. The GPU device ( / dev / nvidia0) also needs to be mounted in the model training task. The GPU device is lost during the model training process and the GPU driver cannot be used, for example: Originally there were eight GPU devices on the host, but one or more devices were...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14
CPCG06F11/1446
Inventor 姬贵阳
Owner GUANGDONG INSPUR BIG DATA RES CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products