Data providing method and system for model training

A model training and data set technology, applied in the computer field, can solve the problems of not considering the unique characteristics of cluster deep learning operations, low data reading efficiency, etc., and achieve the effect of improving data reading efficiency

Pending Publication Date: 2021-05-04
北京聚云科技有限公司
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing file systems for machine learning training mainly include parallel file systems and network file systems, such as CephFS, BeeGFS, GPFS, etc., but the design of these file systems does not consider the unique characteristics of cluster deep learning jobs, data reading efficiency low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data providing method and system for model training
  • Data providing method and system for model training
  • Data providing method and system for model training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0027] It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0028] In machine learning, on the one hand, a computer with powerful computing power is needed for model training, and on the other hand, enough data samples are needed for the computer to learn. Due to the huge amount of data required for model training and heavy computing tasks, in many cases it is necessary to rely on computer clusters for data reading and computing. How to support such a huge machine learning, especially deep learning training cluster, and quickly read models and training samples has become an u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a data providing method and system for model training, relates to the technical field of computers, and can effectively improve the data acquisition efficiency of a model training task. The method comprises the steps of receiving a data request of a model training task, wherein the data request carries a data set identifier and a file identifier of a target file required by the model training task; positioning a storage position of a target object corresponding to the target file in an object storage server in a hierarchical directory according to the data set identifier and the file identifier; outputting the target object from the object storage server according to the storage position; converting the target object into a corresponding file in a preset file system, wherein the preset file system is a file system on which the model training task is based. The method and system can be applied to machine learning.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method and system for providing data for model training. Background technique [0002] In recent years, artificial intelligence technology has been more and more widely used in industry and life. As an important branch of artificial intelligence, machine learning can obtain ideal mathematical models through training with large amounts of data. Due to the huge amount of data required for model training and heavy computing tasks, in many cases it is necessary to rely on computer clusters for data reading and computing. How to support such a huge deep learning training cluster for fast reading of models and training samples has become an urgent problem in this field. [0003] Existing file systems for machine learning training mainly include parallel file systems and network file systems, such as CephFS, BeeGFS, GPFS, etc., but the design of these file systems does not...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/13G06F16/172G06K9/62G06N20/00
CPCG06F16/13G06F16/172G06N20/00G06F18/214
Inventor 余虹建李锦丰朱军李秋庆
Owner 北京聚云科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products