Artificial intelligence training method, system and device for massive small files and medium

A massive small file, artificial intelligence technology, applied in the field of artificial intelligence training of massive small files, can solve the problems of uncontrollable granularity, poor data swapping performance, etc., to avoid overfitting problems and improve bandwidth utilization. Effect

Active Publication Date: 2021-03-09
SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the data is pulled from the remote central storage to the local cache, multiple RPC calls are required to read the data. The training performance of massive small files is worse than that of the direct local disk cache, and the granularity is uncontrollable when the data is eliminated. , when the cache space is insufficient, every time the cache is read and written, the performance of data swapping in and swapping out will be poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Artificial intelligence training method, system and device for massive small files and medium
  • Artificial intelligence training method, system and device for massive small files and medium
  • Artificial intelligence training method, system and device for massive small files and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0021] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are to distinguish two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present invention, which will not be described one by one in the subsequent embodiments.

[0022] Based on the above purpose, the first aspect of the embodiments of the present invention proposes an embodiment of an artificial intelligence training method for massive small files. figure 1 What is shown is a schematic diagram of an embodim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an artificial intelligence training method, system and device for massive small files and a storage medium. The method comprises the steps: responding to the start of an artificial intelligence training task, obtaining a data set from a far-end center, and combining the small files in the data set into a data block according to the structural definition of the block; generating a training task data set list based on synchronous shuffle mechanisms between the data blocks and in the data blocks in response to training starting or epoch updating; obtaining file list information of the data blocks according to the training task data set list; and obtaining file data according to the file list information of the data blocks, locally caching the file data with one or moredata block granularity, and performing artificial intelligence task training. According to the method, the problem that the I / O bandwidth utilization rate is low when massive small files read data intraining is solved, the problem that the I / O reading rate is not matched with the GPU computing rate is relieved, the utilization rate of computing resources is increased, and the whole training process of the massive small files is accelerated.

Description

technical field [0001] The present invention relates to the field of AI training, more specifically, a method, system, computer equipment and readable medium for artificial intelligence training of massive small files. Background technique [0002] AI (artificial intelligence) training of large-scale, massive and small files usually includes the following characteristics: 1. Large-scale data sets are usually placed in external storage media (systems), such as nfs, beegfs, cloud, etc.; 2. For traditional The file system, the metadata (including access time, permissions, modification time, etc.) of a large number of small files usually exists on the disk. When obtaining the file, the disk metadata needs to be loaded into the memory first. The location of the file on the disk can be obtained from the information of the file, and the storage information of the file can be obtained from the disk at last, and the overall performance of reading the file is poor; 3. The reading of s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N20/20
CPCG06N20/20G06F18/214
Inventor 刘慧兴
Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products