A method, system, device and medium for accelerating file reading based on Kubernetes
By employing a file-based cleanup strategy and mounting object storage buckets, the problem of frequent copying and clearing of training node datasets was solved, achieving more efficient file management and training efficiency while reducing system complexity.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING CHANGAN TECH CO LTD
- Filing Date
- 2022-09-29
- Publication Date
- 2026-06-23
AI Technical Summary
In existing technologies, the data clearing strategy for training nodes is based on the dataset, which leads to frequent copying and clearing of the dataset, affecting training efficiency. Furthermore, the storage switching logic is complex and cannot meet the requirements of model training for data reading efficiency.
A file-based cleanup strategy is adopted, which uses a Bloom filter to detect the cached file system, cleans up low-referenced or long-unused files, utilizes the cached file system and POD containers for dataset management, reduces the granularity of file copying and cleaning, and improves file utilization efficiency by combining object storage bucket mounting and directory soft links.
A more fine-grained file caching strategy was implemented, avoiding frequent clearing and copying of the dataset, improving training efficiency and reducing system complexity.
Smart Images

Figure CN115563065B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of artificial intelligence technology, specifically relating to a method, system, device, and medium for accelerating file reading based on Kubernetes. Background Technology
[0002] With the widespread adoption of smart devices, the generation and collection of massive amounts of data have laid the foundation for the continuous advancement of artificial intelligence and machine learning. Storing this massive amount of data presents a significant challenge, with storage cost and retrieval efficiency being a major point of contention. High-speed storage devices often have limited storage space, while relatively low-speed devices offer larger storage capacity, but their access efficiency often falls short of user expectations. Meeting the data retrieval efficiency requirements during model training within limited high-speed storage space is a primary problem to be solved.
[0003] The current technology primarily involves copying the dataset selected for the training task from a low-speed storage device to a high-speed storage device on the training node to accelerate training. After the training task ends, the dataset is not immediately deleted from the training node storage. If the dataset is not used again within the next preset time period, it will be cleared from the training node storage. If the dataset size exceeds the physical limit of the training node, a distributed file system is used, mounted to the training node via Kubernetes Volumes for training purposes. Similarly, if the dataset is not used again within the next preset period, it will be removed from the distributed file system. The main problem is that data clearing on the training node is done on a dataset-by-dataset basis, and data is cleared if it is not used within the next period. This leads to frequent copying and clearing of datasets, impacting training efficiency. Furthermore, there is the issue of switching between the training node storage and the distributed file system based on the dataset size and the remaining storage space on the training node, making the logic relatively complex. Summary of the Invention
[0004] In view of the shortcomings of the prior art described above, the present invention provides a method, system, device and medium for accelerating file reading based on Kubernetes to solve the above technical problems.
[0005] This invention provides a method for accelerating file reading based on Kubernetes, comprising: acquiring a training task; adding the dataset of the training task to a cache file system; increasing the reference count of files in the dataset in the cache file system and updating the latest time dynamics; mounting the cache file system to a POD container, reading the dataset by creating a script in the POD container, and performing task training on the read dataset; reducing the reference count of files in the trained dataset; determining the reference count of files in the trained dataset, and cleaning up the corresponding files when the reference count is lower than a preset threshold or the idle time is greater than a preset time threshold.
[0006] According to a specific embodiment of the present invention, the step of adding the training task dataset to the cache file system includes: detecting whether the cache file system caches the training task dataset using a Bloom filter; if the cache file system does not cache the training task dataset, then copying the dataset from the low-speed storage file system to the cache file system.
[0007] According to a specific embodiment of the present invention, the step of copying the dataset from the low-speed storage file system to the high-speed cache file system if the high-speed cache file system does not cache the dataset of the training task includes: detecting the remaining space of the high-speed cache file system; if the remaining space is less than the size of the dataset, then cleaning up the space of the high-speed cache file system until the remaining space of the high-speed cache file system is greater than or equal to the size of the dataset; if the remaining space after cleaning is still not sufficient to meet the size requirement of the dataset, then transferring the training task to a waiting sequence until the remaining space of the high-speed cache file system meets the size requirement of the dataset; if the remaining space is greater than or equal to the size of the dataset, then copying the dataset from the low-speed storage file system to the high-speed cache file system.
[0008] According to a specific embodiment of the present invention, the step of copying the dataset from the low-speed storage file system to the high-speed cache file system if the remaining space is greater than or equal to the space size of the dataset includes: traversing the files in the high-speed cache file system according to the files of the dataset to detect the difference files of the dataset that are not cached by the high-speed cache file system; and copying the difference files of the dataset to the high-speed cache file system to form the dataset in the high-speed cache file system.
[0009] According to a specific embodiment of the present invention, the step of cleaning up the space of the cache file system if the remaining space is less than the space size of the dataset includes: traversing the reference count and idle time of all files in the cache file system; filtering out files whose reference count is lower than a preset threshold or whose idle time is greater than a preset time threshold, and cleaning them up.
[0010] According to a specific embodiment of the present invention, the step of cleaning up the space of the cache file system if the remaining space is less than the space size of the dataset, until the remaining space of the cache file system is greater than or equal to the space size of the dataset, includes: if the remaining space of the cache file system is still less than the space size of the dataset within a preset time, suspending the training task to the waiting sequence; and determining whether to continue task training according to the waiting sequence and the remaining space.
[0011] According to a specific embodiment of the present invention, the steps of mounting the cache file system to a POD container, reading the dataset by establishing a script in the POD container, and training the read dataset include: mounting the cache file system to a specified training directory via the s3fs-fuse driver; creating a POD container and mounting the specified training directory to the POD container via HostPath or PVC; creating a directory symbolic link in the POD container according to the logical structure of the dataset to read the dataset, and destroying the directory symbolic link after training is completed; wherein, the script is the directory symbolic link.
[0012] A Kubernetes-based system for accelerating file reading includes: an information acquisition module for acquiring training tasks; an information detection module for adding the dataset of the training tasks to a cached file system; a first information processing module for increasing the reference count of files in the dataset of the cached file system and updating the latest time dynamics; an information training module for mounting the cached file system to a POD container, reading the dataset by creating a script in the POD container, and performing task training on the read dataset; a second information processing module for reducing the reference count of files in the trained dataset; and a third information processing module for determining the reference count of files in the trained dataset and cleaning up the corresponding files when the reference count is lower than a preset threshold or the idle time is greater than a preset time threshold.
[0013] A Kubernetes-based device for accelerating file reading includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of any of the methods described above.
[0014] A computer-readable medium having instructions stored thereon, the instructions being loaded by a processor and executed as described in any of the preceding claims.
[0015] The technical advantage of this invention lies in its implementation of a finer-grained file caching strategy based on file dimensions, avoiding repeated file clearing and copying at the dataset level and improving efficiency. Simultaneously, the mounting of object storage buckets and the creation of symbolic links enhance file utilization efficiency and reduce system complexity.
[0016] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort. In the drawings:
[0018] Figure 1 This is a flowchart illustrating a specific embodiment of a method for accelerating file reading based on Kubernetes provided by the present invention.
[0019] Figure 2 This is a structural block diagram of a method for accelerating file reading based on Kubernetes, provided by a specific embodiment of the present invention;
[0020] Figure 3 This is a flowchart illustrating a specific embodiment of a Kubernetes-based system for accelerating file reading provided by the present invention.
[0021] Figure 4 This is a structural block diagram of a Kubernetes-based device for accelerating file reading, as provided in a specific embodiment of the present invention. Detailed Implementation
[0022] The embodiments of the present invention will be described below with reference to the accompanying drawings and preferred embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred embodiments are only for illustrating the present invention and not for limiting the scope of protection of the present invention.
[0023] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0024] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.
[0025] First, it should be noted that this application uses Kubernetes to build a POD container, develops training tasks and scheduling system modules in a microservice manner, and utilizes RESTful interfaces and message queues for communication between various system modules. A high-speed cache file system is built using solid-state storage and object storage, while a low-speed storage file system is composed of distributed file systems such as HDFS, NAS, GlusterFS, and FastDFS. The Infiniband protocol is used to improve file transfer speed.
[0026] A cache file system (CacheFS) is a common non-volatile caching mechanism. CacheFS improves the performance of certain file systems by utilizing small, fast local disks. For example, CacheFS can be used to improve the performance of NFS environments.
[0027] CacheFS is a common non-volatile high-speed caching mechanism. For network file systems like NFS (Network File System) and AFS (Andrew File System), the impact of the network presents challenges to the real-time performance of data access and storage, especially in early 100Mb / s network environments. To address the issue of response time, a local caching scheme called CacheFS was developed to provide local caching for distributed file systems.
[0028] As part of Linux kernel 2.6.30, CacheFS began supporting NFS, AFS, and several other file systems. CacheFS acts as the caching backend for FS-cache, handling the actual data storage and retrieval, and utilizing block device partitions. However, CacheFS cannot be used on just any file system; the file system must be writable by FS-Cache.
[0029] When using CacheFS to improve NFS environment performance, CacheFS works differently on different versions of NFS. For example, if the client and backend filesystem are running NFS version 2 or 3, files are cached in the foreground filesystem for client access. However, if both the client and server are running NFS version 4, the behavior is as follows: when a client initially requests access to a file in the CacheFS filesystem, the request bypasses the foreground (i.e., cached) filesystem and accesses the backend filesystem directly. With NFS version 4, files are no longer cached in the foreground filesystem. The backend filesystem provides all file access. Furthermore, since no files are cached in the foreground filesystem, CacheFS-specific mount options (which are designed to affect the foreground filesystem) are ignored. CacheFS-specific mount options do not apply to the backend filesystem.
[0030] Kubernetes, or K8s for short, is an abbreviation of "ubernete," which uses the number 8 to represent the eight characters in its name. It is an open-source application used to manage containerized applications across multiple hosts in a cloud platform. Kubernetes aims to make deploying containerized applications simple and powerful, providing a mechanism for application deployment, planning, updating, and maintenance.
[0031] Traditional application deployment methods involve installing applications via plugins or scripts. The drawback of this approach is that the application's operation, configuration, management, and entire lifecycle become tied to the current operating system. This hinders application upgrades, updates, and rollbacks. While some functionality can be achieved by creating virtual machines, virtual machines are very resource-intensive and detrimental to portability.
[0032] The new approach is achieved through container deployment. Each container is isolated from the others, has its own file system, and processes within containers do not interfere with each other, allowing for the differentiation of computing resources. Compared to virtual machines, containers can be deployed quickly. Because containers are decoupled from the underlying infrastructure and machine file system, they can be migrated between different clouds and different operating system versions.
[0033] Containers consume fewer resources and deploy quickly. Each application can be packaged into a container image, and the one-to-one relationship between each application and container gives containers a significant advantage. Using containers, container images can be created for applications during the build or release phase. Because each application does not need to be combined with other application stacks or depend on the production environment infrastructure, this provides a consistent environment from development to testing and production. Similarly, containers are lighter and more "transparent" than virtual machines, making them easier to monitor and manage.
[0034] A Pod is the smallest resource management component in Kubernetes, and also the smallest resource object for running containerized applications. A Pod represents a single process running in a cluster. Most other components in Kubernetes are built around and support the Pod, extending its functionality.
[0035] Pod containers fall into two categories: autonomous Pods and self-healing Pods. Autonomous Pods are Pods that cannot self-heal. Once created (whether directly by you or by another controller), they are scheduled onto a node in the Kubernetes cluster. The Pod remains on that node until its process terminates, it is deleted, it is evicted due to lack of resources, or the node fails. Pods do not self-heal. If the node on which the Pod runs fails, or if the scheduler itself fails, the Pod will be deleted. Similarly, if the node hosting the Pod lacks resources or the Pod is under maintenance, it will be evicted.
[0036] Pod Management: Kubernetes uses a higher-level abstraction layer called the controller to manage Pod instances. A controller can create and manage multiple Pods, providing replication management, rolling upgrades, and cluster-level self-healing capabilities. For example, if a node fails, the controller can automatically reschedule the Pods on that node to other healthy nodes. While you can use Pods directly, in Kubernetes, it's typically the controller that manages them.
[0037] Example 1
[0038] Please see Figure 1-2 As shown, a method for accelerating file reading based on Kubernetes includes:
[0039] Step S10: Obtain the training task;
[0040] Step S20: Add the dataset of the training task to the cache file system. Specific steps include:
[0041] Step S21: Detect whether the training task dataset is cached in the cache file system using a Bloom filter, wherein the Bloom filter is generated based on the files in the dataset as unique IDs.
[0042] The Bloom filter, proposed by Bloom in 1970, is essentially a long binary vector and a series of random mapping functions. Bloom filters can be used to check whether an element is in a set. Its advantages include significantly better space efficiency and query time compared to general algorithms.
[0043] Preferably, in this embodiment of the application, a Bloom filter is used to detect the cache file system.
[0044] Step S22: If the cache file system does not cache the dataset of the training task, then copy the dataset from the low-speed storage file system to the cache file system.
[0045] Before copying the dataset to the cache file system, the remaining space in the cache file system is checked. The specific steps are as follows:
[0046] Step S221: Detect the remaining space of the cache file system;
[0047] Step S222: If the remaining space is less than the size of the dataset, the space of the cache file system is cleaned up until the remaining space of the cache file system is greater than or equal to the size of the dataset.
[0048] The specific steps for cleaning up the cache file system space are as follows:
[0049] Step S2221: Traverse the reference counts and idle times of all files in the cache file system;
[0050] Step S2222: Filter out files whose citation count is lower than a preset threshold or whose idle time is greater than a preset time threshold, and clean them up.
[0051] Step S223: If the remaining space after cleaning is still insufficient to meet the size requirement of the dataset, the training task is transferred to a waiting sequence until the remaining space of the cache file system meets the size requirement of the dataset.
[0052] If, within a preset time, the remaining space of the cache file system is still less than the size of the dataset, the training task is suspended to the waiting sequence. Based on the waiting sequence and the remaining space, it is determined whether to continue training the task. If the cache file system cannot clear enough space to copy the dataset within a preset time, the training task is interrupted, and an alarm signal is sent to remind staff to manually clear the cache file system.
[0053] Step S224: If the remaining space is greater than or equal to the space size of the dataset, then copy the dataset from the low-speed storage file system to the high-speed cache file system.
[0054] The specific copying steps are as follows:
[0055] Step S2241: Traverse the files in the cache file system according to the files of the dataset to detect the difference files of the dataset that are not cached by the cache file system;
[0056] Step S2242: Copy the difference file of the dataset to the cache file system to form the dataset in the cache file system.
[0057] It should be noted that in this application, the cleanup strategy adopts the least recently used principle and whether a file is idle as the eviction criterion. The copying mechanism operates on a per-file basis, which, compared to the existing technology based on a dataset, enables a more granular file caching strategy and minimizes copying overhead.
[0058] Step S30: Increase the reference count of the files in the dataset in the cache file system and update the latest time dynamics to avoid the training task from crashing and being interrupted due to deletion by the cleanup strategy.
[0059] Step S40: Mount the cache file system to the POD container, create a script in the POD container to read the dataset, and perform task training on the read dataset.
[0060] The specific steps are as follows:
[0061] Step S41: After the dataset is copied to the cache file system, the cache file system is mounted to the specified training directory through the s3fs-fuse driver.
[0062] S3FS is a file system interface developed by Google that supports exporting buckets in object storage as files. It is a file system based on FUSE and allows Linux to mount S3 buckets on the local file system. S3FS can maintain the original format of objects.
[0063] Step S42: Create a POD container and mount the specified training directory to the POD container via HostPath or PVC;
[0064] HostPath mounts an actual directory on the Node host to the Pod for container use. This design ensures that even if the Pod is destroyed, the data can still exist on the Node host.
[0065] Step S43: Create a directory soft link in the POD container according to the logical structure of the dataset to read the dataset, and destroy the directory soft link after training is completed.
[0066] By creating symbolic links in the POD container, the dataset is directly read from the cached file system, thus avoiding copying files from the dataset into the POD container. These symbolic links are located in an emptyDir directory of the POD; when the POD container is destroyed, the symbolic links are also destroyed, without affecting the dataset files in the cached file system.
[0067] After the training dataset is copied to the cached file system, a POD container for training is created using Kubernetes. GPU, CPU, and memory resources are mounted into the POD container, and preset environment variables are configured. After training, the POD container is destroyed, and the CPU, GPU, and memory resources are released for other training tasks. The training output is managed centrally in a preset output directory, and relevant metrics during training are collected, such as training duration, data size, clock frequency, and GPU memory size, to monitor and measure the efficiency and results of the training task. The method of mounting object storage buckets and creating symbolic links improves file utilization efficiency and reduces system complexity.
[0068] Step S50: Reduce the number of references to the files in the dataset after training is complete.
[0069] Step S60: Determine the number of references to files in the dataset after training is completed, and clean up the corresponding files when the number of references is lower than a preset threshold or the idle time is greater than a preset time threshold.
[0070] Once the training task is complete, the corresponding dataset in the cache file system should be cleaned up promptly to avoid accumulating cache file system space. Simultaneously, the cache file system will periodically clean up files based on their reference count and idle time.
[0071] It should be noted that the steps of the various methods described above are only for clarity. In practice, they can be combined into one step or some steps can be split into multiple steps. As long as they contain the same logical relationship, they are all within the scope of protection of this patent. Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but without changing the core design of the algorithm and process, are also within the scope of protection of this patent.
[0072] Example 2
[0073] Please see Figure 3 As shown, embodiments of this application also provide a system for accelerating file reading based on Kubernetes, including:
[0074] The information acquisition module is used to acquire training tasks;
[0075] An information detection module is used to add the dataset of the training task to the cache file system;
[0076] The first information processing module is used to increase the reference count of files in the dataset of the cache file system and update the latest time dynamics.
[0077] The information training module is used to mount the cache file system to the POD container, read the dataset by creating a script in the POD container, and perform task training on the read dataset.
[0078] The second information processing module is used to reduce the number of references to files in the dataset after training is completed;
[0079] The third information processing module is used to determine the number of references to files in the dataset after training is completed, and to clean up the corresponding files when the number of references is lower than a preset threshold or the idle time is greater than a preset time threshold.
[0080] It should be noted that the Kubernetes-based accelerated file reading system provided in Embodiment 2 above and the Kubernetes-based accelerated file reading method provided in Embodiment 1 above belong to the same concept. The specific methods by which each module and unit performs operations have been described in detail in the method embodiments and will not be repeated here. In practical applications, the Kubernetes-based accelerated file reading system provided in the above embodiments can be assigned to different functional modules as needed to complete all or part of the functions described above, and this is not a limitation here.
[0081] Example 3
[0082] Please see Figure 4 As shown, embodiments of this application also provide a Kubernetes-based device for accelerating file reading, including a memory 2, a processor 1, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the methods described above.
[0083] The memory includes at least one type of readable storage medium, such as flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the memory can be an external storage device of the electronic device, such as a plug-in portable hard drive, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc. Furthermore, the memory can include both internal and external storage units of the electronic device. The memory can be used not only to store application software and various types of data installed on the electronic device, but also to temporarily store data that has been output or will be output.
[0084] In some embodiments, a processor can be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits packaged with the same or different functions. This includes combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The processor is the control unit of the electronic device, connecting various components of the device via various interfaces and lines. It executes programs or modules stored in the memory and calls data stored in the memory to perform various functions and process data within the electronic device.
[0085] The processor executes the operating system of the electronic device and various installed applications. The processor executes the applications to implement the steps in the above embodiments of the lithium-ion battery cold solder joint detection method.
[0086] For example, the computer program may be divided into one or more modules, which are stored in the memory and executed by the processor to complete the present invention. The one or more modules may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of the computer program in the electronic device.
[0087] The integrated unit implemented as a software functional module described above can be stored in a computer-readable storage medium. This software functional module, stored in a storage medium, includes several instructions to cause a computer device (which may be a personal computer, computer equipment, or network device, etc.) or processor to execute some functions of the lithium battery cold solder joint detection method of the various embodiments of the present invention.
[0088] In summary, the technical advantages of this invention lie in its implementation of a more granular file caching strategy through a file-level cleanup approach, avoiding repeated file clearing and copying at the dataset level and thus improving efficiency. Simultaneously, the mounting of object storage buckets and the creation of symbolic links enhance file utilization efficiency and reduce system complexity.
[0089] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.
Claims
1. A method for accelerating file reading based on Kubernetes, characterized in that, include: Obtain the training task; Adding the training task dataset to the cache file system includes: detecting whether each file has been cached in the cache file system based on the identifier corresponding to each file in the dataset; if not, only copying the uncached difference files from the low-speed storage file system to the cache file system. Increase the reference count for each file already added to the cached file system, and update the latest time dynamics for each file; The cache file system is mounted into the POD container. A corresponding directory symbolic link is created for the dataset in the POD container to read the dataset through the directory symbolic link and to perform task training on the read dataset. Reduce the number of citations for files in the dataset after training is complete; Iterate through each file in the cached file system, determine its reference count, and clean up the corresponding file when the reference count is lower than a preset threshold or the idle time is greater than a preset time threshold.
2. The method for accelerating file reading based on Kubernetes according to claim 1, characterized in that, The steps of copying uncached difference files from a slow storage file system to the high-speed cache file system include: Detect the remaining space of the cache file system: If the remaining space is less than the space required by the uncached difference files, the space of the cache file system is cleaned up until the remaining space of the cache file system is greater than or equal to the space required by the uncached difference files. If the remaining space after cleaning is still insufficient to meet the space requirements of the uncached difference files, the training task will be transferred to the waiting sequence until the remaining space of the cache file system meets the space requirements of the uncached difference files. If the remaining space is greater than or equal to the space required by the uncached difference file, then the uncached difference file is copied from the low-speed storage file system to the high-speed cache file system.
3. The method for accelerating file reading based on Kubernetes according to claim 2, characterized in that, If the remaining space is less than the space required by the uncached difference files, the steps for cleaning up the space of the cache file system include: Iterate through the reference counts and idle times of all files in the cached file system; Files with fewer than a preset threshold number of citations or more than a preset threshold of idle time are filtered out and cleaned up.
4. The method for accelerating file reading based on Kubernetes according to claim 2, characterized in that, If the remaining space is less than the space required by the uncached difference files, the process of cleaning up the space in the cache file system until the remaining space in the cache file system is greater than or equal to the space required by the uncached difference files includes: If, within a preset time, the remaining space of the cache file system is still less than the space required by the uncached difference files, the training task is suspended to the waiting sequence. Based on the waiting sequence and the remaining space, determine whether to continue task training.
5. The method for accelerating file reading based on Kubernetes according to claim 1, characterized in that, The steps of mounting the cached file system to the POD container, creating corresponding symbolic links for the dataset within the POD container, reading the dataset through these symbolic links, and performing task training on the read dataset include: The cache file system is mounted to the specified training directory using the s3fs-fuse driver; Create a POD container and mount the specified training directory to the POD container via HostPath or PVC; Based on the logical structure of the dataset, a directory symbolic link is created in the POD container to read the dataset, and the directory symbolic link is destroyed after training is completed.
6. A system for accelerating file reading based on Kubernetes, characterized in that, include: The information acquisition module is used to acquire training tasks; The information detection module is used to add the dataset of the training task to the cache file system, including: detecting whether each file has been cached in the cache file system according to the identifier corresponding to each file in the dataset; if not, only copying the uncached difference files from the low-speed storage file system to the cache file system. The first information processing module is used to add reference counts to each file that has been added to the cache file system and update the latest time dynamics of each file. The information training module is used to mount the cache file system to the POD container, create corresponding directory soft links for the dataset in the POD container, read the dataset through the directory soft links, and perform task training on the read dataset. The second information processing module is used to reduce the number of references to files in the dataset after training is completed; The third information processing module is used to traverse each file in the cache file system, determine its reference count, and clean up the corresponding file when the reference count is lower than a preset threshold or the idle time is greater than a preset time threshold.
7. A device for accelerating file reading based on Kubernetes, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 5.
8. A computer-readable medium, characterized in that, It stores instructions that are loaded by a processor and executed as described in any one of claims 1 to 5.