Database index generation method and apparatus, data processing method and apparatus

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By building an aggregated index in the distributed file system, the problems of accuracy and real-time performance in quota calculation are solved, enabling efficient and accurate user usage statistics and meeting the quota function requirements under large-scale data.

CN119645931BActive Publication Date: 2026-06-19BEIJING BAIDU NETCOM SCI & TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING BAIDU NETCOM SCI & TECH CO LTD
Filing Date: 2024-11-14
Publication Date: 2026-06-19

Application Information

Patent Timeline

14 Nov 2024

Application

19 Jun 2026

Publication

CN119645931B

IPC: G06F16/13; G06F16/16; G06F16/182

AI Tagging

Application Domain

File access structures File/folder operations

Technology Topics

Distributed File SystemDatabase index

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Eliminating redundant encryption in a distributed file system
US20260180962A1Digital data protection Securing communicationDistributed File SystemNetwork connection
Data replication method for distributed file system, and distributed file system
HK40135009ADistributed File SystemDistributed computing
Abnormality processing method and device, electronic equipment, storage medium and program product
CN122261887ADigital data information retrieval Fault responseDistributed File SystemFile system
A vertical meteorological element prediction method and platform based on a big data framework and web technology
CN122153834AVisual data mining Structured data browsing Mathematical model Engineering
Client-based name cache handling external to distributed storage system
US12675440B2Distributed File SystemFile system

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In distributed file systems, traditional quota calculation methods cannot accurately count user usage, resulting in problems such as low accuracy, mediocre performance, insufficient scalability, and poor timeliness.

Method used

By constructing an aggregated index, the items to be aggregated related to file quotas are recorded, the item aggregation function is determined, an aggregated index is created in the metadata database, and statistics are performed on the items to be aggregated according to the item aggregation function. The statistical data is then saved to realize quota calculation.

Benefits of technology

It improves the real-time performance and accuracy of usage calculation in distributed file systems, ensures the real-time performance and eventual consistency of quota calculation, and avoids single point bottlenecks and data consistency issues.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN119645931B_ABST

Patent Text Reader

Abstract

This disclosure provides a database index generation method and apparatus, a data processing method and apparatus, relating to the field of computer technology, specifically to distributed systems, storage management, and other technical fields. The specific implementation scheme is as follows: based on the metadata of the metadata database in the distributed file system, determine the items to be aggregated related to file quotas; based on the items to be aggregated, determine the item aggregation function; create an aggregation index in the metadata database; perform statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; and store the statistical data as aggregated data in the aggregation index.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computers, specifically to the technical fields of large models, deep learning, and image processing, and in particular to a database index generation method and apparatus, a data processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product. Background Technology

[0002] The quota function in the file system is used to count user usage. When a user's usage exceeds the quota size, user access needs to be restricted to prevent over-issuance. Therefore, the quota function plays a billing role in multi-tenant file systems.

[0003] For single-machine file systems, since all data in the directory tree is centralized, user usage can be easily tracked. However, with the ever-increasing scale of user data, single file systems often require hundreds of petabytes (PB) of storage space, which traditional single-machine architectures often cannot support. Therefore, file systems based on distributed architectures, i.e., distributed file systems, have emerged. In distributed file systems, because directory tree data is distributed across multiple machines to achieve data sharding and load balancing, calculating quotas for the distributed architecture of the file system is quite difficult, making it impossible to accurately calculate user usage. Summary of the Invention

[0004] This disclosure provides a database index generation method and apparatus, a data processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

[0005] According to the first aspect, a database index generation method is provided, which includes: determining items to be aggregated related to file quotas based on metadata in a metadata database of a distributed file system; determining an aggregation function for the items to be aggregated based on the items to be aggregated; creating an aggregation index in the metadata database; performing statistics on the items to be aggregated according to the aggregation function to obtain statistical data; and storing the statistical data as aggregated data in the aggregation index.

[0006] According to the second aspect, a data processing method is provided, the method comprising: sending a request to a metadata database of a distributed file system to obtain aggregated data, the aggregated data being data created by using a database index generation method as described in any implementation of the first aspect; receiving the aggregated data sent by the metadata database; and performing quota detection on the aggregated data to obtain a quota detection result.

[0007] According to a third aspect, a database index generation apparatus is provided, comprising: an item determination unit configured to determine items to be aggregated related to file quotas based on metadata in a metadata database of a distributed file system; a function determination unit configured to determine an item aggregation function based on the items to be aggregated; a creation unit configured to create an aggregated index in the metadata database; a obtaining unit configured to perform statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; and a saving unit configured to save the statistical data as aggregated data in the aggregated index.

[0008] According to a fourth aspect, a data processing apparatus is provided, comprising: a sending unit configured to send a request to a metadata database of a distributed file system to obtain aggregated data, the aggregated data being data created by a database index generation apparatus as described in any implementation of the third aspect; a receiving unit configured to receive the aggregated data sent by the metadata database; and a detection unit configured to perform quota detection on the aggregated data to obtain a quota detection result.

[0009] According to a fifth aspect, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform a method as described in any implementation of the first or second aspect.

[0010] According to a sixth aspect, a non-transitory computer-readable storage medium is provided that stores computer instructions for causing a computer to perform the method described in any implementation of the first or second aspect.

[0011] According to a seventh aspect, a computer program product is provided, including a computer program that, when executed by a processor, implements the method as described in either the first or second aspect.

[0012] The database index generation method and apparatus provided in this disclosure first determine the items to be aggregated related to file quotas based on the metadata of the metadata database in the distributed file system; second, determine the item aggregation function based on the items to be aggregated; third, create an aggregation index in the metadata database; fourth, perform statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; and finally, store the statistical data as aggregated data in the aggregation index. Thus, by leveraging the aggregation index mechanism of the distributed file system's metadata database, the items to be aggregated in the distributed file system can be aggregated. This allows for the effective calculation of quota-related usage without disrupting user input / output paths. Furthermore, the aggregation of the items to be aggregated ensures that the usage is real-time and eventually consistent, improving the real-time performance and accuracy of usage calculation in the distributed file system.

[0013] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0014] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0015] Figure 1 This is a flowchart of an embodiment of the database index generation method disclosed herein;

[0016] Figure 2 This is a schematic diagram of the structure of this public distributed file system;

[0017] Figure 3 This is a schematic diagram of a structure for implementing aggregated indexes in this open metadata database;

[0018] Figure 4 This is a flowchart of one embodiment of the data processing method according to the present disclosure;

[0019] Figure 5 This is a schematic diagram of the process of detecting aggregated data in this public disclosure;

[0020] Figure 6 This is a schematic diagram of a structure of an embodiment of the database index generation apparatus according to the present disclosure;

[0021] Figure 7 This is a schematic diagram of the structure of an embodiment of the database processing apparatus according to the present disclosure;

[0022] Figure 8 This is a block diagram of an electronic device used to implement the database index generation method or data processing method of the embodiments of this disclosure. Detailed Implementation

[0023] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0024] The quota function of a distributed file system is used to count user usage, specifically usage related to the quota. User usage includes two dimensions: the number of files and the file size. Traditionally, quota functions typically use methods based on I / O-related statistics or metadata snapshots to calculate user usage.

[0025] The method based on IO-related statistics involves embedding tracking points along the IO path of user requests to generate IO operation logs. These logs typically record fields such as file system identifier, operation type, file index number, offset, and data length, and are synchronized to a central node. This node is responsible for parsing, processing, and analyzing these IO operation logs, incrementally adjusting usage data in real time to ultimately obtain complete usage data for use by the upper-layer quota module. This method has the following drawbacks:

[0026] 1) Low accuracy: IO operation logs need to be synchronized between the directory tree node and the quota center node. This process is often asynchronous with user IO, which can lead to inconsistencies. For example, when a change is made to the directory tree node, a user IO operation might succeed, but the IO operation log synchronization might fail. If the IO operation is a large file deletion, it will result in significant deviations in user usage.

[0027] 2) Performance is average. An IO operation log needs to go through multiple steps from generation to effectiveness. Some of these steps are costly, including: writing the IO operation log to disk and synchronizing the IO operation log via RPC. These inefficient operations amplify the overhead on disk and network, and at the same time reduce the timeliness of quotas to some extent.

[0028] 3) Insufficient scalability: Because IO operation logs need to be aggregated from various directory tree nodes to the quota center node, and the quota center node is a single centralized node without scalability, if the file system's metadata operations or file read / write interface throughput is high, a single point of bottleneck may occur, which limits the overall performance of the system.

[0029] The snapshot-based approach utilizes the metadata database's data transfer capabilities to periodically export full metadata snapshots. These snapshots are then parsed, processed, and analyzed to obtain the number of files and file sizes for each file system. These values are then summed to obtain the usage data for each file system. Typically, for scalability reasons, metadata databases have multiple data nodes. After each data node exports its snapshot, the data is aggregated to a central node for subsequent aggregation tasks. This approach has the following drawbacks:

[0030] 1) Poor timeliness. Due to the large amount of data, a single data transfer process typically takes hours. If the failure rate of the data transfer is considered, the time consumption will be further amplified. User quota functions usually require latency to be no more than minutes; therefore, the metadata-based snapshot solution cannot meet the timeliness requirements.

[0031] 2) Poor data consistency. Typical file systems try to distribute data evenly across several metadata nodes. In snapshot-based solutions, snapshots from all nodes need to be aggregated. Considering the time-consuming and potentially unsuccessful transfer process, when there are many metadata nodes, the snapshot timestamps of these nodes may differ by several hours. Forcibly merging this data will lead to anomalies or even errors due to inconsistent data views.

[0032] To address the shortcomings of traditional technologies, this disclosure proposes a database index generation method. By constructing an aggregated index to record quota-related items to be aggregated, it provides a reliable implementation for quota calculation in distributed file systems. Figure 1 A flow 100 is shown as an embodiment of a database index generation method according to the present disclosure, the database index generation method comprising the following steps:

[0033] Step 101: Based on the metadata of the metadata database in the distributed file system, determine the items to be aggregated related to file quotas.

[0034] In this embodiment, the execution entity on which the database index generation method runs can be the storage engine of the metadata database. This storage engine is also a directory tree node. The metadata database is the database of the distributed file system, and it primarily stores the directory tree and file attributes of the distributed file system, such as... Figure 2 As shown, in a distributed file system, Figure 2 In this system, directory tree nodes provide clients with metadata services representing file semantics. Because the metadata from the distributed file system is decentralized to the underlying source database, directory tree nodes are stateless services, making them easy to load balance and scale horizontally. Furthermore, the metadata database is also based on a distributed architecture with multiple metadata shards. These shards store data from a single table across multiple machines, collectively providing data services to the outside world.

[0035] In this embodiment, file quotas include: name quotas and space quotas. The name quota is a hard limit on the number of file and directory names in the current directory tree. If the quota is exceeded, file and directory creation will fail. The space quota is a hard limit on the number of bytes used by files in the tree rooted at this directory. If the quota does not allow writing a complete block, block allocation will fail.

[0036] In this embodiment, the item to be aggregated is at least one key field in the directory entries of the metadata database. The items to be aggregated in the metadata database related to file quotas include key fields related to user usage, such as file owner, file size, and number of recorded files. The item to be aggregated may be a key field related to name quotas and / or space quotas.

[0037] In this embodiment, after obtaining the metadata of the distributed file system's metadata database, at least one key field can be selected as an aggregation item from all key fields related to file quotas based on quota calculation requirements. The aggregation item is the field to be aggregated in the aggregated index.

[0038] In the technical solution disclosed herein, the collection, storage, use, processing, transmission, provision, and disclosure of metadata and items to be aggregated are performed under authorization and comply with relevant laws and regulations. User-related information in the metadata and items to be aggregated is obtained with the user's permission.

[0039] Step 102: Determine the aggregation function for the items to be aggregated.

[0040] In this embodiment, the item aggregation function is a function that calculates the value of the item to be aggregated. This item aggregation function belongs to the user-defined functions in the aggregated index. Since the metadata aggregated index is a mechanism that automatically maintains user-defined aggregation functions, the item aggregation function is a function that the metadata aggregated index needs to automatically maintain.

[0041] In this embodiment, the item aggregation function can be determined based on the quota calculation requirements of the current item to be aggregated. For example, if the quota requirement of the item to be aggregated is to accumulate, then the item aggregation function is to accumulate the value function of the item to be aggregated.

[0042] Step 103: Create a clustered index in the metadata database.

[0043] In this embodiment, the aggregated index in the metadata database is a mechanism for automatically maintaining user-defined aggregate functions. It supports grouping and calculating the number of records based on a given key, as well as accumulating the key value. For example, when metadata is inserted, deleted, or updated, the aggregated value can be automatically maintained according to the aggregated index settings and stored in the aggregated index. Its implementation principle is as follows: Figure 3 As shown:

[0044] Figure 3 In the Proxy section, the metadata front-end node is primarily responsible for receiving users' structured query statements, parsing, analyzing, and optimizing them, generating and executing physical plans, and ultimately converting them into key-value operations on the BE node. The BE node represents the source database storage engine node, which is mainly responsible for managing key-value records and implementing operations such as adding, deleting, modifying, and querying key-value records. The Primarytablet represents the primary index, the Secondary Index tablet represents the secondary index, and the Aggro Index tablet represents the clustered index. The CDC is a component located in the BE, which stands for Change Data Capture. It is used to receive record change messages and perform corresponding processing, such as updating secondary indexes.

[0045] Clustered indexes are similar to secondary indexes, but differ in that they maintain an aggregate function for a given key. After the BE (Browser Entity) completes key-value operations, it generates a record change message according to the clustered index settings and sends it to the CDC (Cyclic Data Center). When the CDC processes this message, it applies the changes to the clustered index, thereby updating the aggregated value.

[0046] In this embodiment, creating an aggregated index primarily involves creating an index data table. This index data table stores data related to quota-related items to be aggregated, i.e., aggregated data. The values in the index data table change as the client accesses the metadata database and the values calculated using the item aggregation function change.

[0047] In this embodiment, a clustered index is created to pre-calculate and store the results of frequently used statistical queries, facilitating quick and easy data provision during subsequent statistical queries. On one hand, the ordered nature of the index improves the processing efficiency of statistical queries; on the other hand, the clustered index updates accordingly as the data changes, ensuring data real-time performance. The aggregated data in the clustered index is derived from statistical analysis of all relevant data within the indexed relationship (which can be a single table or the result table of a subquery), ensuring the accuracy of the statistical data.

[0048] Step 104: Perform statistics on the terms to be aggregated according to the aggregation function to obtain statistical data.

[0049] In this embodiment, the statistical data is the data obtained after performing the function operation corresponding to the item aggregation function on the item to be aggregated. The statistical data includes the key representing the item to be aggregated and the actual value of the item to be aggregated after performing the item aggregation function calculation.

[0050] In this embodiment, statistical analysis of the item to be aggregated according to the item aggregation function means using the item aggregation function to calculate the value of the item to be aggregated and obtain the actual value of the item to be aggregated; for example, if the item aggregation function is an accumulation function, then after obtaining the value of the item to be aggregated, the value of the item to be aggregated is accumulated to obtain the actual value of the item to be aggregated.

[0051] Optionally, after the client performs operations such as adding, deleting, modifying, and querying the metadata related to the item to be aggregated in the metadata database, the client can perform statistics on the item to be aggregated according to the item aggregation function to obtain statistical data.

[0052] Step 105: Save the statistical data as aggregated data in the aggregated index.

[0053] In this embodiment, aggregated data is the data stored in the aggregated index. Aggregated indexes generally store data in the form of tables. Aggregated data represents the fields of the table in the aggregated index and the actual values of the fields. The fields are the items to be aggregated, and the actual values of the fields are the actual values of the items to be aggregated.

[0054] In this embodiment, 105 includes: using the items to be aggregated in the statistical data as fields of the aggregated data, and storing the actual values of the items to be aggregated in the statistical data as the actual values of the corresponding fields in the aggregated data in the aggregated index.

[0055] The database index generation method provided in this disclosure first determines the items to be aggregated related to file quotas based on the metadata of the metadata database in the distributed file system; second, it determines the item aggregation function based on the items to be aggregated; third, it creates an aggregation index in the metadata database; fourth, it performs statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; and finally, it stores the statistical data as aggregated data in the aggregation index. Thus, by leveraging the aggregation index mechanism of the distributed file system's metadata database, it achieves aggregation of items to be aggregated in the distributed file system, thereby effectively calculating usage without disrupting user input / output paths, and ensuring that usage calculations are real-time and eventually consistent, improving the real-time performance and accuracy of distributed file system usage calculations.

[0056] In some embodiments of this disclosure, the above-mentioned determination of the file quota-related aggregate items based on the metadata of the distributed file system's metadata database includes: obtaining the calculation requirement data of the file quota; obtaining the key fields of the directory entry table and target entry table of the distributed file system's metadata database; and matching the key fields with the calculation requirement data to obtain the file quota-related aggregate items.

[0057] In this optional implementation, the file quota calculation requirements include: name quota calculation data and / or space quota calculation data. The name quota calculation data refers to the number of files and directories in the current distributed file system directory tree whose names do not exceed the corresponding first quota. The space quota calculation data refers to the second quota, which is the number of bytes used by files in the root tree of the current distributed file system. Both the first and second quotas are values set based on the actual needs of the distributed file system.

[0058] In this optional implementation, the directory entry table of the metadata database is used to record the memory structure of the management files and directories of the distributed file system. Each file and directory has a memory structure in the distributed file system, which records information such as the file name, parent directory, and subdirectories. These memory structures are displayed and processed in the form of tables, and the fields in the tables are the key fields of the memory structures.

[0059] In this optional implementation, key fields can be compared with the calculation requirement data in terms of similarity. Key fields with a similarity greater than a similarity threshold are the items to be aggregated related to file quotas. The similarity threshold can be set based on development requirements, for example, a similarity threshold of 80%.

[0060] Optionally, the above process of matching key fields with calculation requirement data to obtain items to be aggregated related to file quotas includes: vectorizing key fields to obtain field vectors; vectorizing calculation requirement data to obtain requirement vectors; comparing the similarity between field vectors and requirement vectors, and selecting key fields corresponding to field vectors with similarity greater than a similarity threshold as items to be aggregated.

[0061] This optional implementation provides a method for determining items to be aggregated, which obtains the calculation requirement data for file quotas; obtains the key fields of the directory entry table and target item table of the distributed file system's metadata database; matches the key fields with the calculation requirement data to obtain items to be aggregated related to file quotas. Thus, by associating the acquisition of items to be aggregated with the calculation requirement data for file quotas, the acquisition of items to be aggregated can be effectively obtained, improving the reliability of obtaining items to be aggregated.

[0062] Optionally, the metadata of the distributed file system's metadata database used to determine the items to be aggregated related to file quotas includes: filtering all quota-related key fields in the metadata of the distributed file system's metadata database to obtain the items to be aggregated.

[0063] In some embodiments of this disclosure, determining the item aggregation function based on the item to be aggregated includes: selecting functions from a pre-set set of functions based on the item to be aggregated to obtain a set of selected functions; and obtaining the item aggregation function based on the file quota calculation requirement data and the set of selected functions.

[0064] In this optional implementation, the function set includes multiple functions, each with a corresponding processing function, such as summation, finding the maximum, or finding the sum within a set time period.

[0065] In this optional implementation, the above-mentioned selection of functions from a pre-set set of functions based on the items to be aggregated, to obtain the selected function set, includes: selecting functions from the function set that can process the items to be aggregated, to obtain the selected functions; determining key operation information based on file quota calculation data, determining functions related to the key operation information from the selected functions, to obtain the item aggregation function.

[0066] The method for obtaining item aggregation functions provided in this embodiment selects functions from a pre-set function set based on the item to be aggregated, thus obtaining a selected function set; and obtains item aggregation functions based on file quota calculation requirement data and the selected function set. Therefore, functions related to file quota calculation requirements are selected from the function set to obtain item aggregation functions, thereby improving the reliability and accuracy of obtaining item aggregation functions.

[0067] In some embodiments of this disclosure, the creation of an aggregated index includes: creating an index data table; and recording the correspondence between the aggregate function of the record item, the item to be aggregated, and the index data table in the metadata database of the distributed file system.

[0068] In this optional implementation, the index data table is a table that records the relationship between the item aggregation function and the item to be aggregated. The fields of the index data table include the item to be aggregated, and the values of each field in the index data table are calculated by the item aggregation function.

[0069] In this optional implementation, the correspondence between the record item aggregation function, the item to be aggregated, and the index data table in the metadata database of the distributed file system can effectively characterize the index data table as a table that records the item to be aggregated and the item aggregation function.

[0070] In this optional implementation, the clustered index is a newly created clustered index mechanism. Its function is similar to that of a regular clustered index, but its operation is different. It is updated asynchronously, and after the index data table is created, the update of the data in the index data table does not block the update process of the main table.

[0071] In this optional implementation, the index data table is a table that records all data of the aggregated index. The contents of this table are updated asynchronously and do not affect the updates of other tables in the metadata database. That is, the index data table is updated asynchronously relative to other tables in the metadata database.

[0072] The optional implementation provides a method for creating aggregated indexes, which involves creating an index data table and recording the correspondence between the item aggregation function, the item to be aggregated, and the index data table in the distributed file system's metadata database. Therefore, recording the correspondence between the index data table, item aggregation functions, and items to be aggregated after creating the index data table improves the accuracy of the data relationships in the created aggregated index.

[0073] In some optional implementations of this disclosure, the creation of the aggregated index further includes: after creating the index data table, determining the data length of the statistical data; and allocating space for the index data table according to the data length.

[0074] In this optional implementation, the data length of the statistical data can be calculated using a length calculation algorithm, or the data length of the statistical data can be determined by directly counting the number of bytes in the statistical data.

[0075] In this optional implementation, statistical data is stored in the data structure of the aggregated index. That is, for each index key value in the statistical data, additional space is allocated in the data structure of the aggregated index to store the statistical data.

[0076] This optional implementation provides the following steps for creating an aggregated index: creating an index data table; determining the data length of the statistical data; allocating space for the index data table according to the data length; and establishing the correspondence between the record item aggregation function, the item to be aggregated, and the index data table in the metadata database of the distributed file system.

[0077] This optional implementation provides a method for creating aggregated indexes that determines the data length of statistical data and allocates space to the index data table according to the data length. This ensures that statistical data is recorded effectively and improves the accuracy of aggregated index creation.

[0078] In some optional implementations of this disclosure, the above-mentioned items to be aggregated include: the number of files and file sizes under different file owners in the directory entry table of the metadata database; the item aggregation function includes: a summation function; the statistical data obtained by performing statistics on the items to be aggregated according to the item aggregation function includes: summing the number of files under different file owners using the summation function to obtain the file count; summing the file sizes under different file owners using the summation function to obtain the capacity value; and using the file count and capacity value as statistical data.

[0079] In this optional implementation, to store the directory tree data of the distributed file system, the metadata database needs to store a directory entry table. The key fields of the directory entry table are: parent_inode, name, inode, type, ugi, and size. Among them, parent_inode and name represent the primary key, inode represents the inode, type represents the file type, ugi represents the file owner, and size represents the file size.

[0080] To configure the UGI quota function using the aggregated index mechanism of the metadata database, an index data table needs to be set up in the aggregated index. The format of the index data table is as follows: ugi, aggr_num, size; where ugi is the primary key of the index data table, aggr_num and size are other fields in the index data table besides the primary key, aggr_num represents the number of records, and after the index data table is grouped according to ugi, aggr_num records the number of files and size records the file size, thereby realizing the statistics of the usage of the UGI quota function.

[0081] It's important to note that clustered indexes are a general-purpose capability that can be used to perform aggregate calculations on any combination of keys in a database. UGI's quota functionality, however, is a special type of aggregation. To implement UGI's quota functionality using clustered indexes, configuration is required, specifying which keys should be aggregated and how the aggregation function is defined. UGI's quota functionality is essentially user-level usage statistics. Each user has a capacity and usage limit; when usage exceeds the capacity, write operations need to be restricted to prevent insufficient system resources.

[0082] The optional implementation provides a method for obtaining statistical data by summing the number of files owned by different file owners using a summation function to obtain the file count; and by summing the file sizes owned by different file owners using a summation function to obtain the capacity value. By using the file count and capacity value as statistical data, a reliable implementation method for calculating file usage is provided, thereby improving the reliability of the obtained statistical data.

[0083] In some optional implementations of this disclosure, the above-mentioned items to be aggregated also include: the modulo value of the index node of the directory item table of the metadata database modulo the set value; and the statistical data obtained by performing statistics on the items to be aggregated according to the item aggregation function includes: controlling the modulo value to be a positive integer before calculating the statistical data.

[0084] In this optional implementation, the index data table has the following format: ugi, inode_mod, aggr_num, size. Here, ugi and inode_mod are the primary keys of the index data table, and aggr_num and size are the other fields in the index data table besides the primary key. Inode is the index node in the directory entry table, and inode_mod represents the value obtained by modulo N on the index node in the directory entry table. By modulo N on the index node in the directory entry table and ensuring that the result is a positive integer, the number of index nodes can be greater than N, i.e., inode_mod = inode % N, where N is 1000.

[0085] In this optional implementation, the purpose of adding inode_mod to the primary key ugi of the index data table is to avoid hotspot issues caused by frequent updates of aggregated index records for all records under the same ugi.

[0086] In this optional implementation, the set value is a pre-set integer value, for example, the set value is 1000. The items to be aggregated also include: the modulo value of the index node of the directory entry table of the metadata database modulo the set value, and the modulo value is controlled to be a positive integer. This can distribute the pressure to the set value records, thereby avoiding database hotspots caused by frequent updates to the same record.

[0087] The method for obtaining statistical data provided by this optional implementation also includes the modulo value of the index node of the directory entry table of the metadata database modulo the set value. Before calculating the statistical data, the modulo value is controlled to be a positive integer, which can ensure that the access pressure of the metadata database is distributed to the set value records, avoid the metadata database hotspot caused by updating the same record, and improve the reliability of distributed file system access.

[0088] To enable the use of quotas, this disclosure also provides a data processing method. Specifically, Figure 4 A flow 400 is shown as an embodiment of the data processing method according to this disclosure, the data processing method including the following steps:

[0089] Step 401: Send a request to the metadata database of the distributed file system to retrieve aggregated data.

[0090] In this embodiment, the aggregated data is data created using the database index generation method described in the above embodiment. After the aggregated index is created, all aggregated data of the aggregated index is stored in the corresponding area of the aggregated index, such as the index data table. By accessing this area, such as accessing the index data table, the aggregated data can be obtained.

[0091] In this embodiment, the request to obtain aggregated data refers to the request to obtain aggregated data from the aggregated index. The aggregated data in the aggregated index is data that includes keys and values, where the key is the item to be aggregated and the value is the actual value corresponding to the item to be aggregated.

[0092] In this embodiment, the quota server in the metadata database is used to manage the aggregated index. Specifically, the quota server manages multiple quota table shards. The service capabilities provided by the quota server are: second-level updates; support for billions of files in a single cluster; eventual consistency; and no data loss.

[0093] like Figure 5 As shown, the directory tree server is a component in the metadata database that provides directory tree services. Metadata related to the directory tree is stored in directory entry tables, and multiple directory entry tables are stored in different directory entry table shards. Clients access the directory tree server's metadata interface to modify data in the directory entry table shards at intervals. Modifications to the data in the directory entry table shards are synchronized to the quota table shards of the quota server via the CDC mechanism. The quota table shards are used to store quota tables; data processing methods run on top of these shards (such as...). Figure 5 The client in the process obtains the aggregated data in the current quota table shard by accessing the quota server. The aggregated data is the quota information and usage status of the distributed file system. The quota server directly accesses the quota table shard and provides services to the execution entity.

[0094] Step 402: Receive aggregated data sent by the metadata database.

[0095] In this embodiment, when the quota server receives a request to obtain aggregated data, it obtains the aggregated data from the quota table shards and sends the aggregated data to the execution entity on which the data processing method runs.

[0096] Step 403: Perform quota detection on the aggregated data to obtain the quota detection results.

[0097] In this embodiment, the detection of aggregated data includes: detecting whether the aggregated data includes the item to be aggregated and the actual value of the item to be aggregated; in response to detecting that the aggregated data includes the item to be aggregated and the actual value of the item to be aggregated, determining that the format of the aggregated data is qualified; detecting whether the actual value of the item to be aggregated is greater than a preset quota value; if the actual value of the item to be aggregated is greater than the preset quota value, generating a quota detection result for the item to be aggregated exceeding the quota.

[0098] The data processing method provided in this embodiment has advantages such as accuracy, real-time performance, and efficiency, meeting users' needs for quota functionality under massive data volumes. In its specific implementation, since metadata is persisted using a distributed database, the quota function is implemented using the database's aggregated index mechanism. When a user's request operation is completed, the aggregated index asynchronously maintains usage information, ensuring not only no loss to the user's I / O path but also second-level real-time performance and eventual consistency.

[0099] The data processing method provided in this embodiment sends a request to the metadata database of the distributed file system to obtain aggregated data; receives the aggregated data sent by the metadata database; performs quota detection on the aggregated data, and obtains the quota detection result. This provides a reliable implementation method for the quota function of the distributed file system and ensures the reliability of the usage calculation of the distributed file system.

[0100] In some optional implementations of this disclosure, the above-mentioned quota detection of aggregated data to obtain quota detection results includes: obtaining a preset quota value; detecting whether the aggregated data is greater than the preset quota value, and obtaining quota detection results including aggregated data being greater than the preset quota value, aggregated data being less than the preset quota value, or aggregated data being equal to the preset quota value.

[0101] In this optional implementation, the preset quota value is a quota limit pre-set for the distributed file system. For example, if the preset quota value is 50 files, then the preset quota value is the quota for the number of files.

[0102] This optional implementation provides a method for obtaining quota detection results, which involves acquiring a preset quota value; detecting whether the aggregated data is greater than the preset quota value; and obtaining quota detection results including whether the aggregated data is greater than the preset quota value, less than the preset quota value, or equal to the preset quota value. By acquiring the preset quota value, the limit of the aggregated data can be flexibly set, thus improving the flexibility of quota detection.

[0103] In some optional implementations of this disclosure, the above data processing method further includes: issuing an alarm message in response to the quota detection result that the aggregated data is greater than a preset quota value.

[0104] In this optional implementation, the form of alarm information and the transmission channel can be varied. For example, a message related to the alarm information can be sent to a pre-set alarm channel, or a message related to the alarm information can be sent to a specific server of the distributed file system so that the distributed file system can determine that the current file usage has exceeded the preset quota value.

[0105] The data processing method provided by this optional implementation can issue an alarm message when the aggregated data detected by the quota detection result is greater than the preset quota value. This can effectively warn of quota overruns and improve the reliability and accuracy of quota settings in the distributed file system.

[0106] Further reference Figure 6 As an implementation of the methods shown in the above figures, this disclosure provides an embodiment of a database index generation apparatus, which is similar to... Figure 1 Corresponding to the method embodiments shown, the device can be specifically used in various electronic devices.

[0107] like Figure 6 As shown, the database index generation device 600 provided in this embodiment includes: an item determination unit 601, a function determination unit 602, a creation unit 603, a obtaining unit 604, and a saving unit 605. The item determination unit 601 can be configured to determine items to be aggregated related to file quotas based on metadata in the metadata database of the distributed file system. The function determination unit 602 can be configured to determine an item aggregation function based on the items to be aggregated. The creation unit 603 can be configured to create an aggregated index in the metadata database. The obtaining unit 604 can be configured to perform statistics on the items to be aggregated according to the item aggregation function to obtain statistical data. The saving unit 605 can be configured to save the statistical data as aggregated data in the aggregated index.

[0108] In this embodiment, the specific processing and technical effects of the item determination unit 601, function determination unit 602, creation unit 603, obtaining unit 604, and saving unit 605 in the database index generation device 600 can be found in the following references. Figure 1 The relevant descriptions of steps 101, 102, 103, 104, and 105 in the corresponding embodiments will not be repeated here.

[0109] In some optional implementations of this embodiment, the item determination unit 601 is configured to: obtain the calculation requirement data of file quota; obtain the key fields of the directory entry table and the target entry table of the metadata database of the distributed file system; and match the key fields with the calculation requirement data to obtain the items to be aggregated related to the file quota.

[0110] In some optional implementations of this embodiment, the function determination unit 602 is configured to: select functions from a pre-set set of functions based on the items to be aggregated, to obtain a set of selected functions; and obtain an item aggregation function based on the file quota calculation requirement data and the set of selected functions.

[0111] In some optional implementations of this embodiment, the creation unit 603 is configured to: create an index data table; and establish the correspondence between the record item aggregation function, the item to be aggregated, and the index data table in the metadata database of the distributed file system.

[0112] In some optional implementations of this embodiment, the creation unit 603 is further configured to: determine the data length of the statistical data; and allocate space for the index data table according to the data length.

[0113] In some optional implementations of this embodiment, the above-mentioned items to be aggregated include: the number of files and file sizes under different file owners in the directory entry table of the metadata database; the item aggregation function includes: a summation function; the above-mentioned obtaining unit 604 is configured to: use the summation function to sum the number of files under different file owners to obtain the number of files; use the summation function to sum the file sizes under different file owners to obtain the capacity value; and use the number of files and the capacity value as statistical data.

[0114] In some optional implementations of this embodiment, the above-mentioned items to be aggregated further include: the modulo value of the index node of the directory entry table of the metadata database modulo the set value; the above-mentioned obtaining unit 604 is configured to: control the modulo value to be a positive integer before calculating the statistical data.

[0115] The database index generation apparatus provided in the embodiments of this disclosure firstly, an item determination unit 601 determines items to be aggregated related to file quotas based on metadata in the metadata database of the distributed file system; secondly, a function determination unit 602 determines an item aggregation function based on the items to be aggregated; thirdly, a creation unit 603 creates an aggregation index in the metadata database; then, an acquisition unit 604 performs statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; finally, a storage unit 605 stores the statistical data as aggregated data in the aggregation index. Thus, by utilizing the aggregation index mechanism of the metadata database of the distributed file system, the items to be aggregated in the distributed file system can be aggregated, thereby effectively calculating usage without compromising user input / output paths, and ensuring real-time and eventual consistency of usage calculations, improving the real-time performance and accuracy of distributed file system usage calculations.

[0116] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0117] Further reference Figure 7 As an implementation of the methods shown in the above figures, this disclosure provides an embodiment of a data processing apparatus, which is similar to... Figure 4Corresponding to the method embodiments shown, the device can be specifically used in various electronic devices.

[0118] like Figure 7 As shown, the data processing apparatus 700 provided in this embodiment includes: a sending unit 701, a receiving unit 702, and a detection unit 703. The sending unit 701 can be configured to send a request to the metadata database of a distributed file system to obtain aggregated data, where the aggregated data is data created by the database index generation apparatus described in the above embodiment. The receiving unit 702 can be configured to receive the aggregated data sent by the metadata database. The detection unit 703 can be configured to perform quota detection on the aggregated data and obtain a quota detection result.

[0119] In this embodiment, the specific processing of the sending unit 701, receiving unit 702, and detection unit 703 in the data processing device 700 and the resulting technical effects can be referred to respectively. Figure 4 The relevant descriptions of steps 401, 402, and 403 in the corresponding embodiments will not be repeated here.

[0120] In some optional implementations of this embodiment, the detection unit 703 is configured to obtain a preset quota value; detect whether the aggregated data is greater than the preset quota value, and obtain a quota detection result including aggregated data being greater than the preset quota value, aggregated data being less than the preset quota value, or aggregated data being equal to the preset quota value.

[0121] In some optional implementations of this embodiment, the above-mentioned device further includes an alarm unit (not shown in the figure), which is configured to issue an alarm message in response to the quota detection result that the aggregated data is greater than a preset quota value.

[0122] The data processing apparatus provided in the embodiments of this disclosure includes a sending unit 701 sending a request to the metadata database of the distributed file system to obtain aggregated data; a receiving unit 702 receiving the aggregated data sent by the metadata database; and a detection unit 703 performing quota detection on the aggregated data to obtain quota detection results. This provides a reliable implementation method for the quota function of the distributed file system and ensures the reliability of the usage calculation of the distributed file system.

[0123] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0124] Figure 6A schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0125] like Figure 6 As shown, device 600 includes a computing unit 601, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 602 or a computer program loaded from storage unit 608 into random access memory (RAM) 603. RAM 603 may also store various programs and data required for the operation of device 600. The computing unit 601, ROM 602, and RAM 603 are interconnected via bus 604. Input / output (I / O) interface 605 is also connected to bus 604.

[0126] Multiple components in device 600 are connected to I / O interface 605, including: input unit 606, such as keyboard, mouse, etc.; output unit 607, such as various types of monitors, speakers, etc.; storage unit 608, such as disk, optical disk, etc.; and communication unit 609, such as network card, modem, wireless transceiver, etc. Communication unit 609 allows device 600 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0127] The computing unit 601 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as database index generation methods or data processing methods. For example, in some embodiments, the database index generation method or data processing method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and / or installed on device 600 via ROM 602 and / or communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the database index generation method or data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform a database index generation method or a data processing method by any other suitable means (e.g., by means of firmware).

[0128] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0129] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable database indexing apparatus or data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0130] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0131] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0132] The systems and technologies described herein can be implemented in computing systems that include back-end components (e.g., as a data server), or computing systems that include middleware components (e.g., an information server), or computing systems that include front-end components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0133] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0134] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0135] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A database index generation method, the database index generation method comprising: Based on the metadata in the metadata database of the distributed file system, identify the items to be aggregated related to file quotas; The items to be aggregated include the number of files and file sizes under different file owners in the directory entry table of the metadata database, as well as the modulo value obtained by taking the index node of the directory entry table modulo a preset integer N. The modulo value is used to distribute the update pressure under the same file owner to N index records. Based on the items to be aggregated, a term aggregation function is determined; the term aggregation function includes a summation function. An aggregated index is created in the metadata database; the data change capture (CDC) receives record change messages generated by the storage engine node (BE) in the metadata database after processing key-value operations, and the CDC asynchronously updates the aggregated index based on the record change messages; wherein, creating the aggregated index includes: creating an index data table, and using the modulo value and the file owner as the composite primary key of the index data table corresponding to the aggregated index; Before calculating the statistical data, the modulo value is controlled to be a positive integer; the items to be aggregated are statistically analyzed according to the item aggregation function to obtain the statistical data; the statistical data includes the number of files obtained by summing the number of files under different file owners using the summation function, and the capacity value obtained by summing the file sizes under different file owners. The statistical data is stored as aggregated data in the aggregated index.

2. The method according to claim 1, wherein, The metadata in the distributed file system's metadata database determines the items to be aggregated related to file quotas, including: Obtain the data required for calculating file quotas; Obtain the directory entry table and key fields of the metadata database of the distributed file system; The key fields are matched with the calculation requirement data to obtain the items to be aggregated related to file quotas.

3. The method according to claim 1, wherein, The term aggregation function determined based on the term to be aggregated includes: Based on the items to be aggregated, functions are selected from a pre-set set of functions to obtain a selected function set; Based on the file quota calculation requirement data and the selected function set, an item aggregation function is obtained.

4. The method according to claim 1, wherein, The creation of the aggregated index includes: Create an indexed data table; The metadata database of the distributed file system records the correspondence between the item aggregation function, the item to be aggregated, and the index data table.

5. The method according to claim 4, wherein, After creating the indexed data table, the method further includes: Determine the data length of the statistical data; Allocate space for the index data table according to the data length.

6. The method according to any one of claims 1-5, wherein, The step of performing statistical analysis on the items to be aggregated according to the aggregation function to obtain statistical data includes: summing the number of files under different file owners using a summation function to obtain the file count; summing the file sizes under different file owners using a summation function to obtain the capacity value; and using the file count and the capacity value as statistical data.

7. A data processing method, the method comprising: Send a request to the metadata database of the distributed file system to obtain aggregated data, wherein the aggregated data is data created by the database index generation method according to any one of claims 1-6; Receive aggregated data sent by the metadata database; Quota detection is performed on the aggregated data to obtain quota detection results.

8. The method according to claim 7, wherein, The quota detection process for the aggregated data, resulting in the quota detection results, includes: Get the preset quota value; Detect whether the aggregated data is greater than the preset quota value, and obtain quota detection results including whether the aggregated data is greater than the preset quota value, whether the aggregated data is less than the preset quota value, or whether the aggregated data is equal to the preset quota value.

9. The method according to claim 7, further comprising: In response to the quota detection result indicating that the aggregated data is greater than the preset quota value, an alarm message is issued.

10. A database index generation apparatus, the database index generation apparatus comprising: The item determination unit is configured to determine the items to be aggregated related to file quotas based on the metadata of the metadata database in the distributed file system. The items to be aggregated include the number of files and file sizes under different file owners in the directory entry table of the metadata database, as well as the modulo value obtained by taking the index node of the directory entry table modulo a preset integer N. The modulo value is used to distribute the update pressure under the same file owner to N index records. A function determination unit is configured to determine a term aggregation function based on the term to be aggregated; the term aggregation function includes a summation function. The cell is created and configured to create an aggregate index in the metadata database; The Data Change Capture (CDC) receives record change messages generated by the storage engine node BE in the metadata database after processing key-value operations, and the CDC asynchronously updates the aggregate index based on the record change messages. Create an index data table, and use the modulo value and the file owner together as the composite primary key of the index data table corresponding to the aggregate index; The unit is configured to control the modulo value to be a positive integer before calculating statistical data; it performs statistics on the items to be aggregated according to the item aggregation function to obtain statistical data; the statistical data includes the number of files obtained by summing the number of files under different file owners using the summation function, and the capacity value obtained by summing the file sizes under different file owners. The storage unit is configured to store the statistical data as aggregated data in the aggregated index.

11. The apparatus according to claim 10, wherein, The item determination unit is configured to: obtain the calculation requirement data of file quota; obtain the directory entry table of the metadata database of the distributed file system and the key fields of the directory entry table; and match the key fields with the calculation requirement data to obtain the items to be aggregated related to file quota.

12. The apparatus according to claim 10, wherein, The function determination unit is configured to: select functions from a pre-set function set based on the item to be aggregated, to obtain a selected function set; and obtain an item aggregation function based on the file quota calculation requirement data and the selected function set.

13. The apparatus according to claim 10, wherein, The creation unit is configured to: create an index data table; and record the correspondence between the item aggregation function, the item to be aggregated, and the index data table in the metadata database of the distributed file system.

14. The apparatus according to claim 13, wherein, The creation unit is further configured to: determine the data length of the statistical data; and allocate space for the index data table according to the data length.

15. The apparatus according to any one of claims 11-14, wherein, The obtaining unit is configured to: sum the number of files under different file owners using a summation function to obtain the file count; sum the file sizes under different file owners using a summation function to obtain the capacity value; and use the file count and the capacity value as statistical data.

16. A data processing apparatus, the apparatus comprising: The sending unit is configured to send a request to the metadata database of the distributed file system to obtain aggregated data, said aggregated data being data created by the database index generation apparatus according to any one of claims 10-15; The receiving unit is configured to receive aggregated data sent by the metadata database; The detection unit is configured to perform quota detection on the aggregated data and obtain quota detection results.

17. The apparatus according to claim 16, wherein, The detection unit is configured to acquire a preset quota value; detect whether the aggregated data is greater than the preset quota value, and obtain a quota detection result including the aggregated data being greater than the preset quota value, the aggregated data being less than the preset quota value, or the aggregated data being equal to the preset quota value.

18. The apparatus of claim 17, further comprising: An alarm unit is configured to issue an alarm message in response to the quota detection result indicating that the aggregated data is greater than a preset quota value.

19. An electronic device, characterized in that, include: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-9.

21. A computer program product comprising a computer program that, when executed by a processor, implements the method of any one of claims 1-9.

Citation Information

Patent Citations

Database index generation method, machine readable storage medium and computer equipment
CN115374121A
Quota information reporting method and device
CN118733251A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Database index generation method, machine readable storage medium and computer equipment

Quota information reporting method and device