Storage system and configuration method of storage cluster

By using multiple storage clusters to classify and store different types of data in the storage cluster, and by utilizing quota management and automatic governance modules, the problems of data disorder and resource waste in storage cluster management are solved, achieving efficient storage space utilization and cost savings.

CN116009784BActive Publication Date: 2026-06-16UNIV OF SCI & TECH OF CHINA +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF SCI & TECH OF CHINA
Filing Date
2022-12-29
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

The current storage cluster management is handled by the users themselves, which leads to data corruption, loss, and contention for storage resources, resulting in wasted and unreasonable use of storage space, reduced utilization of the storage cluster, and increased storage costs.

Method used

Multiple storage clusters are used to classify and store different types of cluster data. The quota management module is used to manage storage space, and the automatic governance module performs data governance according to preset policies, including operations such as transfer, cleanup and packaging. Combined with the disaster recovery backup module, data security is improved.

🎯Benefits of technology

By classifying storage and implementing automatic governance, data clutter can be effectively prevented, data read and write efficiency can be improved, storage costs can be saved, and the utilization and performance of the storage cluster can be increased.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116009784B_ABST
    Figure CN116009784B_ABST
Patent Text Reader

Abstract

The application relates to a storage system and a configuration method of a storage cluster. The storage system comprises a plurality of storage clusters, wherein different storage clusters are used for storing different types of cluster data; the storage cluster comprises a quota management module and an automatic management module; the quota management module is used for quota management of a storage space of the storage cluster; and the automatic management module is used for automatic management of data in the storage cluster according to a preset data management strategy. In this way, different types of cluster data are classified and stored by using different storage clusters, so that classified storage of cluster data can be realized, data confusion can be effectively prevented, data read / write efficiency can be improved, in addition, appropriate storage space can be provided for users by quota management, storage cost is saved, storage space can be released to a certain extent by automatic management of data, the utilization rate of the storage cluster is improved, and the data storage, read / write and access performance of the storage cluster are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of storage management technology, specifically to a configuration method for a storage system and a storage cluster. Background Technology

[0002] As deep learning technology becomes more widely used, the amount of data accumulated by deep learning applications is also increasing, and the storage of cluster data is receiving more and more attention.

[0003] In existing storage clusters, cluster data is mostly managed by the users themselves. Due to varying levels of user management expertise, self-managed storage clusters inevitably experience data corruption, loss, and resource contention, leading to wasted storage space and inefficient use. Therefore, improving the utilization rate of storage clusters and reducing storage costs are crucial issues that urgently need to be addressed. Summary of the Invention

[0004] In view of this, this application provides a configuration method for a storage system and a storage cluster, which can improve the utilization of the storage cluster and reduce storage costs.

[0005] To achieve the above objectives, this application adopts the following technical solution:

[0006] The first aspect of this application provides a storage system, including: multiple storage clusters, wherein different storage clusters are used to store different types of cluster data;

[0007] The storage cluster includes a quota management module and an automatic governance module;

[0008] The quota management module is used to manage the storage space quota of the storage cluster;

[0009] The automatic governance module is used to automatically govern the data in the storage cluster according to a preset data governance strategy.

[0010] Optionally, the plurality of storage clusters include an object storage cluster, a file storage cluster, and a block storage cluster.

[0011] Optionally, the quota management module includes a storage space quota unit for limiting the amount of storage space that a user requests from the storage cluster.

[0012] Optionally, the quota management module includes a file quantity quota unit, used to limit the number of storage files that a user may request from the storage cluster.

[0013] Optionally, the quota management module includes an alarm unit;

[0014] The alarm unit is used to issue an alarm when the storage cluster meets preset conditions; the preset conditions include: the actual storage capacity of the storage cluster reaches the preset storage capacity.

[0015] Optionally, the automatic governance module includes a transfer and recovery unit, used to transfer cluster data in the storage cluster that meets preset transfer conditions; the preset transfer conditions include: the current storage path is a first storage path, and the inaccessibility time reaches a first preset time.

[0016] The transfer and recovery unit is further configured to transfer the transferred data to the first storage path when it detects that the transferred data has been accessed.

[0017] Optionally, the automatic treatment module includes a cleaning unit;

[0018] The cleaning unit is used to clean up cluster data in the storage cluster that meets the cleaning conditions; the cleaning conditions include: the current storage path is the second storage path, and the unused time has reached the second preset time.

[0019] Optionally, the automatic governance module includes a packaging unit;

[0020] The packaging unit is used to package cluster files in the storage cluster that meet the packaging conditions; the packaging conditions include: the current storage path is the third storage path, the file size is less than a preset size, and the number of files smaller than the preset size reaches a preset number.

[0021] The packaging unit is also used to determine the corresponding packaged file and unpack and restore it when it detects that the accessed data is a packaged file.

[0022] Optionally, the storage cluster may also include a disaster recovery backup module;

[0023] The disaster recovery backup module is used to back up and store specific cluster data in the storage cluster, including cluster data whose importance is higher than a set importance level.

[0024] A second aspect of this application provides a method for configuring a storage cluster, comprising:

[0025] Obtain user information; the user information includes user identification information and user data storage requirement information, the user storage requirement information including the data type of the data that the user wants to store;

[0026] Determine whether there exists an available target storage space configured for the user corresponding to the user information in the storage system described in the first aspect of this application, wherein the target storage space is a storage cluster space used to store data of the data type;

[0027] If there is an available target storage space configured for the user corresponding to the user information, then the storage cluster space corresponding to the user's data storage requirement information is determined from the storage system, and the storage cluster space corresponding to the user's data storage requirement information is associated with the user identification information.

[0028] The technical solution provided in this application may include the following beneficial effects:

[0029] In this application, the storage system includes multiple storage clusters, each used to store different types of cluster data. Each storage cluster includes a quota management module and an automatic governance module. The quota management module manages the storage space quotas within the storage clusters, while the automatic governance module automatically governs the data in the storage clusters according to a preset data governance strategy. This storage system can categorize and store different types of cluster data across different storage clusters, effectively preventing data clutter and improving data read / write efficiency. Furthermore, quota management provides users with suitable storage space, saving storage costs. Automatic data governance releases storage space to some extent, increasing the utilization rate of the storage clusters and ultimately improving the data storage, read / write, and access performance of the storage clusters. Attached Figure Description

[0030] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a schematic diagram of the structure of a storage system provided in one embodiment of this application.

[0032] Figure 2 This is a schematic diagram of a storage cluster structure provided in one embodiment of this application.

[0033] Figure 3 This is a flowchart of a storage cluster configuration method provided in one embodiment of this application. Detailed Implementation

[0034] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be described in detail below. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other implementation methods obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0035] As deep learning technology becomes increasingly widespread, the amount of data accumulated in these applications is also growing exponentially. This directly leads to a dramatic increase in the complexity of data storage and management for deep learning clusters. Particularly in the field of computer vision, KB-level image files further strain the cluster's metadata indexing, exacerbating the difficulty of data management. Deep learning clusters often generate petabytes of data. Improper management of this massive amount of data can directly impact the file read / write performance of the deep learning training cluster, leading to a decline in model training performance. Therefore, the storage and management of cluster data has become a major concern.

[0036] In existing deep learning clusters, cluster data storage is typically managed by the users themselves. For example, the storage system allows users to apply for two types of storage space resources: low-performance storage and high-performance storage. After obtaining storage space, users are responsible for managing the data within that space and paying according to the size and duration of the requested storage, regardless of storage performance. Taking supercomputer clusters as another example, they mostly use high-performance distributed storage. However, this high-performance distributed storage only provides users with a certain amount of storage space and charges according to the allocated space size and duration. Data management within this storage space still needs to be done by the user; no data management functionality is provided at the cluster level. Therefore, depending on the user's management skills, self-management may lead to data redundancy, corruption, loss, and contention for storage resources, resulting in wasted and inefficient use of storage space.

[0037] Therefore, embodiments of this application provide a storage system that may include multiple storage clusters, with different storage clusters used to store different types of cluster data. For example... Figure 1 As shown, n storage clusters constitute the storage system, where n is a positive integer greater than or equal to 2.

[0038] It is important to note that different storage clusters can refer to storage clusters of different types or with different performance. In implementation, the corresponding type of storage cluster can be configured according to the type of cluster data, or the corresponding type of storage cluster can be configured according to the storage performance.

[0039] Multiple storage clusters can include clusters of object storage, file storage, and block storage. During implementation, the storage clusters in the storage system can be configured according to actual needs; no restrictions are imposed here.

[0040] Taking a deep learning training cluster as an example, the cluster data that users might use can first be categorized. Deep learning training cluster data generally falls into the following categories: raw data, training data (processed from raw data for model training), product data generated during model training, and code and environment data that the model relies on. Then, corresponding storage clusters can be built for each type of cluster data's storage needs, allowing different types of cluster data to be stored in their respective clusters. For example, training data typically requires distributed computing using the training cluster, and high concurrency reading of this data is necessary during training; therefore, a configuration file storage cluster can be configured for storing the training data. Product data contains the final trained model data; model data is read-only and accessed infrequently, so an object storage cluster can be configured to store the product data.

[0041] When building a storage cluster, you can directly use commercial or open-source storage solutions.

[0042] like Figure 2 As shown, the storage cluster may include a quota management module 201 and an automatic governance module 202. The quota management module 201 is used to manage the storage space quota of the storage cluster. The automatic governance module 202 is used to automatically govern the data in the storage cluster according to a preset data governance strategy.

[0043] In implementation, storage quota management for the storage cluster can be implemented across two dimensions: storage space and the number of stored files. Data governance strategies can be configured according to actual needs and are not limited here. For example, it could involve automatically transferring cold data and automatically cleaning up expired data in the cluster, or packaging small files, etc., thereby releasing storage space in a timely manner and preventing excessive occupation of storage space by invalid data. Quota management provides users with the most cost-effective storage space to meet their storage needs, ensuring that user requirements are met while avoiding excessive waste of storage space and effectively saving storage costs. Automatic governance provides users with certain management functions to prevent the accumulation of expired data, thereby reducing the possibility of data corruption and loss, laying the foundation for the rational use of storage space.

[0044] Once multiple storage clusters are set up, the resulting storage system can be mounted onto all compute nodes of the deep learning cluster for users to choose and use. Users can be individual users or user groups.

[0045] In this embodiment, the storage system includes multiple storage clusters, each used to store different types of cluster data. Each storage cluster includes a quota management module and an automatic governance module. The quota management module manages the storage space quotas of the storage clusters, while the automatic governance module automatically governs the data in the storage clusters according to a preset data governance strategy. This storage system can categorize and store different types of cluster data through different storage clusters, effectively preventing data clutter and improving data read / write efficiency. Furthermore, quota management provides users with suitable storage space, saving storage costs. Automatic data governance can free up storage space to some extent, increasing the utilization rate of the storage clusters and thus improving the performance of data storage, read / write, and access.

[0046] In some embodiments, the quota management module may include a storage space quota unit for limiting the amount of storage space a user can request from the storage cluster.

[0047] For example, if a storage cluster is a file storage cluster and a user requests 500TB of storage for that file storage cluster, then the user will receive a storage space quota of 500TB for that file storage cluster. Storage space quota units can achieve quota management based on space size, providing users with the most cost-effective storage space, satisfying their storage needs while avoiding wasted space due to excessive storage requirements.

[0048] In some embodiments, the quota management module may include a file quantity quota unit for limiting the number of storage files that a user may request from the storage cluster.

[0049] For example, if a storage cluster is a file storage cluster, and a user requests storage for 1000 files in that cluster, then the user will receive storage for 1000 files in that cluster. In other words, the maximum number of files that can be stored in that file storage cluster is 1000. The file quota unit enables quota management based on the number of files, providing users with the most cost-effective file storage space while meeting their storage needs and ensuring the rational use of storage space.

[0050] To prevent data loss and storage resource contention due to insufficient storage capacity in the storage cluster, some embodiments may include an alarm unit in the quota management module. The alarm unit can issue an alarm when the storage cluster meets preset conditions.

[0051] Preset conditions may include: the actual storage capacity of the storage cluster reaches the preset storage capacity.

[0052] The preset storage capacity can be determined based on the quota in the storage space quota unit and / or the file number quota unit.

[0053] For example, the quota management module includes a file quantity quota unit and an alarm module. The actual storage capacity is the actual number of stored files, and the preset storage capacity is the preset storage file quota. The preset storage file quota can be set according to actual needs, and when setting it, the preset storage file quota must be less than or equal to the number of storage files requested by the user from the storage cluster in the file quantity quota unit. When the actual number of stored files reaches the preset storage file quota, the alarm unit issues an alarm.

[0054] If the quota management module includes a storage space quota unit and an alarm module, then the actual storage capacity is the actual occupied space, and the preset storage capacity is the preset storage space quota. The preset storage space quota can also be set according to actual needs, and when setting it, the preset storage space quota must be less than or equal to the amount of storage space requested by the user from the storage cluster in the storage space quota unit. When the actual occupied space reaches the preset storage space quota, the alarm module issues an alarm.

[0055] It should be noted that the quota management module described above may include only a storage space quota unit or a file quantity quota unit, or it may include both storage space quota units and file quantity quota units. Of course, this application is not limited to this; in some other embodiments, the quota management module may also include other related storage space quota units. Similarly, the quota management module described above may include a storage space quota unit and an alarm module, or a file quantity quota unit and an alarm module, or a storage space quota unit, a file quantity quota unit, and an alarm module.

[0056] Taking the quota management module, which includes a storage space quota unit, a file quantity quota unit, and an alarm unit, as an example, the storage space requested by a user from the storage cluster in the storage space quota unit can be a preset storage space quota. Similarly, the number of files requested by a user from the storage cluster in the file quantity quota unit can be a preset storage file quota. Therefore, the preset storage capacity includes both the preset storage space quota and the preset storage file quota. Correspondingly, the actual storage capacity includes the actual occupied space size and the actual number of stored files. When the actual occupied space size in the storage cluster reaches the preset storage space quota, or the actual number of stored files reaches the preset storage file quota, the alarm unit issues an alarm.

[0057] During implementation, the alarm unit can be pre-set with a preset storage capacity and alarm recipient information. When the actual storage capacity of the storage cluster reaches the preset capacity, an alarm is triggered. The alarm unit can then send an alarm message in a specified format to the recipient via a designated channel, allowing the recipient to respond promptly. The designated channel can be WeChat, SMS, or email, among others.

[0058] As mentioned earlier, data in deep learning training clusters is typically categorized into several types, including raw data, training data, output data, and code and environment data. In applications, the storage time requirements for each type of data are usually different. For example, raw data is non-renewable and needs to be stored long-term. However, raw data has a low direct access rate; generally, after being processed once, it won't be used again for several months or even longer. If this type of data is stored in a high-performance storage system for a long time, it will occupy the storage system's space and file quota, thereby reducing the performance of the storage system.

[0059] Therefore, in some embodiments, an automatic transfer function can be configured for such data that needs to be stored for a long time but is accessed infrequently. That is, the automatic management module may include a transfer and recovery unit, used to transfer cluster data in the storage cluster that meets preset transfer conditions; the preset transfer conditions include: the current storage path is a first storage path, and the inaccessibility time reaches a first preset time. The transfer and recovery unit is also used to transfer the transferred data to the first storage path when it is detected that the transferred data has been accessed.

[0060] During implementation, the first step is to set the first storage path and the first preset duration for the transfer and recovery unit. These settings can be adjusted based on actual needs and are not limited here. When the inaccessible time of cluster data stored in the first storage path reaches the first preset duration, automatic transfer is triggered. Correspondingly, a transfer target path also needs to be set, which is the destination where the cluster data to be transferred will be moved. For example, if the first storage path is set to storage directory DA and the first preset duration is 3 months, the transfer and recovery unit will periodically check the last access time of cluster data under storage directory DA. If the last access time of cluster data stored under storage directory DA is more than 3 months from the current time, the cluster data will be automatically transferred to the pre-set transfer target path, thus implementing the automatic transfer function for cluster data.

[0061] To ensure the transfer and recovery unit supports seamless user access to the data—meaning users can still use the data without being aware of it after it has been transferred—the unit records the mapping between the path of the cluster data to be transferred (the primary storage path) and the target path in a data association record table during automatic data transfer. When a user accesses the cluster data again, the target location (target transfer path) is first retrieved from the data association record table. Then, the cluster data is copied back to its original path (the primary storage path) from the directory location, allowing the user to access the data through the original path. This copying process can be defined as the cluster data recovery process. After copying back to the original path, the access time of the cluster data is updated to the current time. The automatic transfer function will only be triggered again when the inaccessible time of this batch of cluster data reaches the first preset duration.

[0062] In some embodiments, the automatic governance module may include a cleanup unit. The cleanup unit is used to clean up cluster data in the storage cluster that meets cleanup conditions; the cleanup conditions may include: the current storage path is a second storage path, and the unused time has reached a second preset time.

[0063] The second storage path can be used to store temporary data. In this way, the cleanup unit can automatically clean up the temporary data on a regular basis, thereby freeing up storage space and reducing the pressure of the number of stored files.

[0064] During implementation, a second storage path and a second preset duration need to be set for the cleanup unit. These settings can be customized based on actual needs and are not limited here. For example, the second preset duration can be set to 2 months. If cluster data under the second storage path remains unused for more than 2 months, it will be automatically cleaned up. Specifically, the cleanup unit will collect the most recent usage time information of the cluster data under the second storage path and calculate the unused time of the cluster data under that second storage path. The unused time of the cluster data under the second storage path = current time - most recent usage time of the cluster data under the second storage path. When the unused time of the cluster data under the second storage path reaches the second preset duration, the cluster data will be automatically cleaned up.

[0065] To enhance data security and reduce the risk of data cleanup, the cleanup unit can also be configured with a recycle bin function. When the cluster data under the second storage path has not been used for a second preset period, a data cleanup operation is triggered, and the cluster data under that second storage path is moved to the recycle bin. The recycle bin can be set with a preset retention time; when the cluster data in the recycle bin has been retained for the preset retention time, the cluster data in the recycle bin will be completely deleted.

[0066] In some embodiments, the automatic governance module may include a packaging unit. The packaging unit is used to package cluster files in the storage cluster that meet the packaging conditions. The packaging conditions include: the current storage path is a third storage path, the file size is less than a preset size, and the number of files smaller than the preset size reaches a preset number. The packaging unit is also used to determine the corresponding packaged file and unpack and restore it when it detects that the accessed data is a packaged file.

[0067] The packaging unit can automatically package files in the cluster data that meet the packaging conditions, thereby reducing the pressure on the storage system due to the large number of files.

[0068] To ensure that the packaging unit can support users to use the packaging function without them noticing, the packaging unit needs to have an automatic unpacking function.

[0069] During implementation, the first step is to set the third storage path, preset size, and preset quantity within the packaging unit, and then configure packaging and unpacking tools for the packaging unit. Setting the third storage path indicates that cluster data under that path participates in automatic packaging. The third storage path, preset size, and preset quantity can be set according to actual needs and are not limited here. Packaging conditions can include three factors: path, individual file size, and the number of files under that path. For example, setting the preset size to 2MB and the preset quantity to 5000 means that when the size of an individual file under the third storage path is less than 2MB, and the number of files with a size less than 2MB reaches 5000, the automatic packaging operation is triggered. Cluster files that meet the packaging conditions will be packaged into one large file to reduce the pressure on the storage system due to the massive number of files and free up storage space.

[0070] In practical applications, the system can periodically check whether the cluster files stored by users in the third storage path meet the packaging conditions. Once the packaging conditions are met, the packaging stage begins. During the packaging stage, a pre-configured packaging tool can be used to package the cluster files in the third storage path. The packaged files can still be stored in the current path, and the original files in the current path before packaging are deleted.

[0071] When a user accesses the original file (before packaging) in the third storage path again, this access operation can be intercepted, triggering an unpacking operation. An unpacking tool can then automatically unpack the packaged cluster files. The unpacked cluster files are stored directly in the current path, and the original packaged file is deleted. This achieves automatic packaging and unpacking without the user's awareness.

[0072] Considering that actual packing and unpacking operations will take some time, some embodiments can add data access frequency to the packing conditions. For example, an automatic packing function can be configured for data with low access frequency, that is, data with low access frequency is stored in a third storage path; an automatic packing function is not set for data with high access frequency, that is, data with high access frequency is stored in a path other than the third storage path. In other words, the packing conditions can include: the current storage path is the third storage path, the file size is less than a preset size, the number of files smaller than the preset size reaches a preset number, and the access frequency reaches a preset access frequency.

[0073] In some embodiments, the storage cluster may further include a disaster recovery backup module; the disaster recovery backup module is used to back up and store specific cluster data in the storage cluster, the specific cluster data including cluster data with a higher importance than a set importance level.

[0074] Disaster recovery backup is designed to address the loss or damage of user cluster data due to irreparable failures in the storage system. During implementation, a fourth storage path and a disaster recovery backup target path can be configured. The fourth storage path is used to store cluster data with a higher priority than the set priority level. The priority level can be set according to actual needs and is not limited here. This allows specific cluster data to be stored in the fourth storage path, so that the data copying tool in the disaster recovery backup module can back up this specific cluster data to the disaster recovery backup target path, thus preventing losses due to the loss of this specific cluster data in the storage system. Alternatively, considering that a full disaster recovery backup of all cluster data would increase storage costs, disaster recovery backup can be configured only for cluster data with high security requirements, while data with low security requirements can be left unbacked up. For example, if the original data is non-reproducible and has high security requirements, a disaster recovery backup module can be configured for the storage cluster storing the original data. For storage clusters storing processed data with low processing costs, since the cost of regenerating the processed data is low, a disaster recovery backup module can be omitted for these storage clusters.

[0075] In addition, to ensure timely backups, a pre-set time interval can be configured according to actual needs. For example, a time interval of 100ms would mean that every 100ms, the disaster recovery backup module would perform a backup of the cluster data under the fourth storage path and use a data copy tool to copy the cluster data to the disaster recovery backup target path.

[0076] Considering that data copying consumes a lot of storage system bandwidth, a compromise can be made between disaster recovery real-time performance and bandwidth consumption. That is, the data copying tool can be configured to support incremental copying, so that incremental data can be copied directly when backing up data, without having to copy the full data every time.

[0077] In practical applications, it's possible to pre-determine which user groups' data in which storage clusters within the storage system require disaster recovery backup. Alternatively, users can be given the option to specify this option when requesting storage space. Another approach is to set up a fourth storage path within the storage cluster, providing disaster recovery backup functionality only for cluster data within that fourth storage path. Users can then store the data requiring disaster recovery backup in this fourth storage path. After determining the disaster recovery backup storage path, it's also necessary to define the target path for the disaster recovery backup. Generally, these two paths cannot belong to the same storage system.

[0078] Embodiments of this application also provide a method for configuring a storage cluster, such as... Figure 3 As shown, the method includes at least the following steps:

[0079] S301. Obtain user information; user information includes user identification information and user data storage requirement information, and user storage requirement information includes the data type of the data that the user wants to store.

[0080] User identification information can be user codes or other information used to represent user identity, and each user has one and only one user identification information.

[0081] S302. Determine whether there is an available target storage space configured for the user corresponding to the user information in the storage system described in any of the above embodiments. The target storage space is a storage cluster space used to store data of data types.

[0082] During implementation, users can provide their information to the storage system by requesting storage space, so that the storage system can obtain the storage clusters they need.

[0083] Specifically, user demand information may also include quota management information and automatic governance information. Correspondingly, determining whether there is available target storage space configured for the user corresponding to the user information in the storage system can include: based on the data type of the data the user intends to store, quota management information, and automatic governance information, detecting whether there is corresponding available target storage space in the storage system.

[0084] If a corresponding available target storage space exists in the storage system, it is determined that there is an available target storage space configured for the user corresponding to the user information, and step S303 can be executed; if no corresponding available target storage space exists in the storage system, it is determined that there is no available target storage space configured for the user corresponding to the user information.

[0085] S303. Determine the storage cluster space corresponding to the user's data storage requirement information from the storage system, and associate the storage cluster space corresponding to the user's data storage requirement information with the user identification information.

[0086] In this embodiment, user information is first obtained. This user information includes user identification information and user data storage requirement information, where the user storage requirement information includes the data type the user wishes to store. Then, it is determined whether an available target storage space exists in the storage system as described in any of the above embodiments, configured for the user corresponding to the user information. The target storage space is a storage cluster space used to store data of a specific data type. If an available target storage space exists, the storage cluster space corresponding to the user data storage requirement information is determined from the storage system, and associated with the user identification information. In this way, users can configure the corresponding storage cluster according to their needs, effectively preventing data corruption and improving data read / write efficiency. Furthermore, the configured storage cluster has quota management and automatic governance functions, enabling quota management and automatic governance of cluster data within the storage cluster. Quota management provides users with suitable storage space, saving storage costs. Automatic data governance releases storage space to a certain extent, improving the utilization rate of the storage cluster and thus enhancing the data storage, read / write, and access performance of the storage cluster.

[0087] Embodiments of this application also provide a storage cluster configuration apparatus, which may include: an acquisition module for acquiring user information; the user information includes user identification information and user data storage requirement information, the user storage requirement information including the data type of the data the user wishes to store; a determination module for determining whether there exists an available target storage space configured for the user corresponding to the user information in the storage system as described in any of the above embodiments, the target storage space being a storage cluster space for storing data of a certain data type; and a determination and association module for determining the storage cluster space corresponding to the user data storage requirement information from the storage system and associating the storage cluster space corresponding to the user data storage requirement information with the user identification information.

[0088] It should be understood that the specific implementation of the storage cluster configuration device provided in the embodiments of this application can refer to the specific implementation of the storage cluster configuration method described in the corresponding embodiments above, and will not be repeated here.

[0089] Embodiments of this application also provide an electronic device for executing the above-described storage cluster configuration method. The electronic device includes a processing component, which further includes one or more processors, and memory resources represented by memory for storing instructions executable by the processing component, such as application programs. The application programs stored in the memory may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component is configured to execute instructions to perform the storage cluster configuration method described in any of the above embodiments.

[0090] The electronic device may also include a power supply component configured to perform power management of the electronic device, a wired or wireless network interface configured to connect the electronic device to a network, and an input / output (I / O) interface. The electronic device may operate based on an operating system stored in memory, such as Windows Server™, MacOSX™, Unix™, Linux™, FreeBSD™, or similar.

[0091] A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the electronic device, enables the electronic device to perform any of the storage cluster configuration methods described in the above embodiments.

[0092] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.

[0093] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0094] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0095] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0096] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0097] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0098] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program verification codes, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0099] It should be noted that in the description of this application, the terms "first," "second," "third," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, in the description of this application, unless otherwise stated, "a plurality of" means two or more.

[0100] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications or equivalent substitutions made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A storage system, characterized in that, include: Multiple storage clusters, wherein different storage clusters are used to store different types of cluster data; the multiple storage clusters include object storage clusters, file storage clusters and block storage clusters; The storage cluster includes a quota management module and an automatic governance module; The quota management module is used to manage the storage space of the storage cluster; the quota management module includes a storage space quota unit and a file quantity quota unit; the storage space quota unit is used to limit the size of the storage space requested by the user from the storage cluster; the file quantity quota unit is used to limit the number of storage files requested by the user from the storage cluster. The automatic governance module is used to automatically govern the data in the storage cluster according to a preset data governance strategy; the automatic governance module includes a transfer and recovery unit, a cleanup unit, and a packaging unit; The transfer and recovery unit is used to transfer cluster data in the storage cluster that meets preset transfer conditions; the preset transfer conditions include: the current storage path is the first storage path, and the inaccessibility time reaches the first preset time; the transfer and recovery unit is also used to transfer the transferred data to the first storage path when it is detected that the transferred data has been accessed. The cleaning unit is used to clean up cluster data in the storage cluster that meets the cleaning conditions; the cleaning conditions include: the current storage path is the second storage path, and the unused time reaches the second preset time. The packaging unit is used to package cluster files in the storage cluster that meet the packaging conditions. The packaging conditions include: the current storage path is the third storage path, the file size is less than a preset size, and the number of files smaller than the preset size reaches a preset number. The packaging unit is also used to determine the corresponding packaged file and unpack and restore it when it detects that the accessed data is a packaged file.

2. The storage system according to claim 1, characterized in that, The quota management module includes an alarm unit; The alarm unit is used to issue an alarm notification when the storage cluster meets preset conditions; The preset conditions include: the actual storage capacity of the storage cluster reaches the preset storage capacity.

3. The storage system according to claim 1, characterized in that, The storage cluster also includes a disaster recovery backup module; The disaster recovery backup module is used to back up and store specific cluster data in the storage cluster, including cluster data whose importance is higher than a set importance level.

4. A method for configuring a storage cluster, characterized in that, include: Obtain user information; the user information includes user identification information and user data storage requirement information, the user data storage requirement information including the data type of the data that the user wants to store; Determine whether there exists an available target storage space configured for the user corresponding to the user information in the storage system according to any one of claims 1-3, wherein the target storage space is a storage cluster space used to store data of the data type; If there is an available target storage space configured for the user corresponding to the user information, then the storage cluster space corresponding to the user's data storage requirement information is determined from the storage system, and the storage cluster space corresponding to the user's data storage requirement information is associated with the user identification information.