Mass small file distributed caching method oriented to AI (Artificial Intelligence) training

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of distributed caching and massive small files, which is applied in the direction of file system, file access structure, storage system, etc., can solve the problems of affecting data access rate, waiting for a long time, missing cache, etc., so as to improve data access rate and increase Cache hit rate, the effect of solving random access problems

Pending Publication Date: 2022-01-07

HANGZHOU DIANZI UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The system proposes a scheme to merge massive small files into data blocks for storage, and proposes intra-group shuffle to replace random access for AI training. However, the system still needs to wait for a long time due to cache misses when AI tasks access data blocks for the first time. Disk I / O time affects data access rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] Below with the accompanying drawings, specific embodiments of the present invention will be further described in detail.

[0019] The invention includes the following steps:

[0020] Step 1: Create Local Cache and Alluxio cache.

[0021] To create key-value key-value store in the Local Cache client and create Alluxio cache in a distributed storage device.

[0022] Alluxio Cache: Alluxio support block buffer memory means, storing the main data chunk combined, whenever an application is not Alluxio access to a cache when the chunk, the chunk will removed from the underlying storage, and restored to Alluxio cache so follow-up visits.

[0023] Local Cache: Local Cache key in key-value store, in the client, main memory chunk parsing all the small files, a chunk whenever removed from the cache Alluxio, small chunk file is parsed and stored in the Local Cache.

[0024] Step 2: AI in the training data storage stage, the data set based on a merge operation chunk Batch Size fitting c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an AI (artificial intelligence) training-oriented distributed caching method for massive small files, which is used for realizing high-performance distributed caching of the massive small files. The method comprises the following steps: firstly, combining small files into chunk according to a rule of fitting Batch Size features in AI training; secondly, analyzing the cache state of chunk, and carrying out double-layer shuffle operation on the small file sequence; and finally, during AI training and data reading, adopting Local Cache short-circuit reading for repeated I / O, and starting asynchronous grouping pre-reading at the Local Cache short-circuit reading time. According to the method, the problem of random reading of massive small files oriented to AI training is solved by efficiently utilizing the cache, and under the scene oriented to AI training, the data access rate and the cache hit rate are remarkably improved, and the iteration time of AI training is shortened.

Description

Technical field [0001] The present invention is directed to smart city, for face recognition, video search, intelligent storage and other scenes, the massive design of small files distributed caching oriented AI training methods to achieve high performance distributed cache massive small files. Background technique [0002] In recent years, with the rapid development of the global socio-economic and science and technology, and large-scale application of AI technology in the landing field of security, strongly boosting the peace smart city development. At the same time, the wisdom of peace in the city, such as face recognition, tracking and other cross-border pedestrian scene of AI technology challenges. AI is usually the number of documents required training mission in millions or even tens of millions of the order, such as Google OpenImages dataset contains 9,000,000 images, Tencent ML-Images dataset contains nearly 1,769 ten thousand images. Usually requires massive scale of te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/172G06F16/13G06F12/0871G06F12/0862

CPCG06F16/172G06F16/13G06F12/0871G06F12/0862G06F2212/1044G06F2212/1021G06F2212/154

Inventor 路锦曾艳赵乃良张纪林袁俊峰万健张雪容沈鸿辉

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Mass small file distributed caching method oriented to AI (Artificial Intelligence) training

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology