Metadata processing method, apparatus, device, medium, and program product

By establishing a metadata table in the distributed database component and using key-value pairs to maintain file path information, the problem of Hudi placing an excessive burden on the file system is solved, improving file operation efficiency and data processing accuracy.

CN115658683BActive Publication Date: 2026-06-26AGRICULTURAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
AGRICULTURAL BANK OF CHINA
Filing Date
2022-09-27
Publication Date
2026-06-26

Smart Images

  • Figure CN115658683B_ABST
    Figure CN115658683B_ABST
Patent Text Reader

Abstract

The application provides a metadata processing method, device, equipment, medium and program product. The method comprises the following steps: establishing a metadata table corresponding to a data organization format component of a data lake in a distributed database component; determining a partition where a record of a to-be-executed data operation is located and a file group in the partition according to a data operation involved in execution of the data organization format component of the data lake; obtaining a target file path from the metadata table according to the partition where the record of the to-be-executed data operation is located and the file group in the partition; and executing the data operation under the target file path. In the technical solution, the metadata table is maintained in the form of a key-value pair by means of the distributed database component, which can avoid occupation of a large number of read-write ports of a distributed file system in file read-write operations of Hudi, reduce the burden on the file system, and prevent failures and continuous retries in the read-write operation process due to excessive burden on the file system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of distributed systems technology, and in particular to a metadata processing method, apparatus, device, medium, and program product. Background Technology

[0002] Apache Hudi (Hudi for short) is a storage format for data lakes that provides the ability to update and delete data on top of the Hadoop file system.

[0003] In existing technologies, Hudi adopts a directory structure of data partitioning, file groups, and file slices, and uses columnar storage files (Parquet) as file slices to store table data. Compared with traditional big data solutions, this organization method can effectively overcome problems such as low data update efficiency, inability to modify table structure in a timely manner, redundancy of historical snapshot data, and high cost of processing small batch incremental data.

[0004] However, since Hudi uses data file backup to save historical data, the data volume of partition folders and data files can be very large. Hudi's operations on folders and files will put a heavy burden on the distributed file system, causing failures and repeated retries during task execution, thus affecting task execution efficiency. Summary of the Invention

[0005] This application provides a metadata processing method, apparatus, device, medium, and program product to address the problem that existing Hudi operations on folders and files increase the burden on the distributed file system, leading to task execution errors.

[0006] In a first aspect, embodiments of this application provide a metadata processing method, including:

[0007] In the distributed database component, a metadata table corresponding to the data organization format component of the data lake is established, and the metadata table includes the file path information of the metadata;

[0008] Based on the data operations involved in the execution of transactions by the data organization format component of the data lake, determine the partition where the record to be performed for the data operation is located and the file group in the partition;

[0009] Based on the partition where the record for which the data operation is to be performed is located and the file group in the partition, the target file path is obtained from the metadata table;

[0010] Perform the data operation at the target file path.

[0011] In one possible design of the first aspect, the metadata table corresponding to the data organization format component for establishing the data lake in the distributed database component includes:

[0012] Obtain information about the partition path, file group, and all file slices under the file group of the data organization format component of the data lake. The data organization format component of the data lake includes different partition paths, and different partition paths include different file groups. The information of the file slice includes the file name and file size.

[0013] Use the partition path and file group as keys, and the information of all file slices as key-value pairs to construct associated key-value pairs;

[0014] The metadata table is constructed based on the associated key-value pairs, which are used as the file path information.

[0015] In another possible design of the first aspect, after performing the data operation under the target file path, it further includes:

[0016] Based on the data operation performed under the target file path, information to be updated is determined, including at least one of partition update information, file group update information, and file slice update information;

[0017] The file path information in the metadata table is updated based on the information to be updated.

[0018] In another possible design of the first aspect, updating the file path information in the metadata table includes:

[0019] Based on the partition update information, the partition path in the file path information is updated, and / or, based on the file group update information, the file group in the partition path is updated, and / or, based on the file slice update information, the file slice information in the file group is updated.

[0020] In another possible design of the first aspect, after performing the data operation under the target file path, it further includes:

[0021] Obtain the metadata after the transaction is completed, and update the metadata files in the metadata folder based on the metadata after the transaction is completed.

[0022] In another possible design of the first aspect, performing the data operation under the target file path includes:

[0023] Perform at least one of the following operations under the target file path: data query operation, data rollback operation, and data extraction and merging operation.

[0024] In another possible design of the first aspect, the distributed database component is a remote dictionary service, and the method further includes:

[0025] Obtain data files with access volumes exceeding a preset threshold from the data organization format component of the data lake, and cache the data files in the memory of the remote dictionary service.

[0026] Secondly, embodiments of this application provide a metadata processing apparatus, including:

[0027] The data table construction module is used to build metadata tables corresponding to the data organization format components of the data lake in the distributed database components. The metadata tables include file path information of metadata.

[0028] The partition group determination module is used to determine the partition where the record to be operated is located and the file group in the partition based on the data operations involved in the data organization format component of the data lake when executing a transaction;

[0029] The path acquisition module is used to obtain the target file path from the metadata table based on the partition where the record to be performed on the data operation is located and the file group in the partition;

[0030] An operation execution module is used to perform the data operations under the target file path.

[0031] Thirdly, embodiments of this application provide a computer device, including: a processor, and a memory communicatively connected to the processor; the memory stores computer execution instructions; the processor executes the computer execution instructions stored in the memory to implement the above-described method.

[0032] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, are used to implement the above-described method.

[0033] Fifthly, embodiments of this application provide a computer program product, including computer instructions that, when executed by a processor, implement the above-described method.

[0034] The metadata processing method, apparatus, device, medium, and program products provided in this application, by using a distributed database component to maintain data file path information in the metadata table in a key-value pair manner, can avoid Hudi occupying a large number of read and write ports of the distributed file system during file read and write operations, reduce the burden on the file system, and prevent failures and continuous retries during read and write operations due to excessive burden on the file system. Attached Figure Description

[0035] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application;

[0036] Figure 1 This is a schematic diagram of Hudi's data storage structure provided in an embodiment of this application;

[0037] Figure 2 A flowchart illustrating the metadata processing method provided in this application embodiment;

[0038] Figure 3 This is a schematic diagram of data processing provided for an embodiment of this application;

[0039] Figure 4 This is a schematic diagram of the metadata processing apparatus provided in the embodiments of this application;

[0040] Figure 5 A schematic diagram of the structure of a computer device provided in an embodiment of this application.

[0041] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0042] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0043] First, let me explain the terms used in this application:

[0044] A data lake is a centralized repository that allows storage of all structured and unstructured data from multiple sources at any scale. It can store data as is without requiring data structuring and can run different types of analytics to process the data.

[0045] The Apache Hudi (Hudi for short) is a data lake storage format that provides the ability to update and delete data on top of the Hadoop file system. It adopts a directory structure of data partitions, filegroups, and file slices, and uses the columnar storage file Parquet as file slices to store table data.

[0046] Partitioning: Hudi uses key values ​​as the basis for partitioning. When a record is written, the specified key value is used as the folder name (if the folder does not exist, it will be created) and the record is saved to the corresponding folder.

[0047] Data files: Hudi uses filegroups and file slices to store historical data. Records in the table correspond to filegroups within a specific partition. Filegroups are distinguished by their partition path and filegroup ID, ensuring that no two filegroups within the same partition have the same filegroup ID. Each filegroup contains file slices with different timestamps, each containing backup data at that specific moment. File slices are primarily distinguished by the timestamp of the transaction they belong to.

[0048] Hudi employs a directory structure of data partitioning, filegroups, and file slices, using the columnar storage file Parquet as file slices to store table data. Compared to traditional big data solutions, this organization effectively overcomes problems such as low data update efficiency, inability to modify table structures in a timely manner, redundancy of historical snapshot data, and high costs for processing small batches of incremental data. Because Hudi uses key-value pairs as the basis for partitioning, when a record is written, it is saved to the corresponding folder with the specified key value as the folder name (creating the folder if it doesn't exist). When performing file operations such as reading and writing data, Hudi loads the corresponding data file based on the partition path. This method effectively reduces the number of files traversed during data file operations. However, on the one hand, because the range of partition key values ​​is uncertain, and on the other hand, because Hudi uses data file backups to store historical data, the data volume of partition folders and data files can be very large. Hudi's operations on folders and files can place a significant burden on the distributed file system, leading to task failures and continuous retries during execution.

[0049] To address the aforementioned issues, embodiments of this application provide a metadata processing method, apparatus, device, medium, and program product. By leveraging a distributed remote dictionary service to maintain data file path information in the metadata table in a key-value pair manner, Hudi can avoid occupying a large number of read / write ports of the distributed file system during file read / write operations, thereby reducing the burden on the file system and preventing failures and continuous retries during read / write operations due to excessive file system load.

[0050] The technical solution of this application will now be described in detail through specific embodiments. It should be noted that the following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0051] Figure 1This is a schematic diagram of the data storage structure of Hudi provided in the embodiments of this application, as shown below. Figure 1 As shown, Hudi employs a directory structure of data partitioning, filegroups, and file slices, and uses the columnar storage file Parquet as the file slice to store table data. When a record is written, it is saved to the corresponding folder with a specified key value as the folder name (or created if the folder does not exist). When performing file operations such as reading and writing data, Hudi loads the data file at the corresponding path based on the partition path.

[0052] Figure 2 This is a flowchart illustrating the metadata processing method provided in an embodiment of this application. This method can be applied to computer devices; taking a computer device as the executing entity as an example, such as... Figure 2 As shown, the method may specifically include the following steps:

[0053] Step S201: Establish a metadata table corresponding to the data organization format component of the data lake in the distributed database component. The metadata table includes file path information for the metadata.

[0054] In this embodiment, the distributed data component can be a remote dictionary server (Redis). Before creating the Hudi metadata table in the distributed database component, the file slice information in the distributed file system can be initialized first.

[0055] The distributed data component Redis can maintain partition and data file path information in Hudi metadata using key-value pairs. Specifically, it can use the partition path + filegroup ID as the key and the "filename-file size" of all file slices under the corresponding filegroup as the key-value pair to maintain the file path information in the metadata in the cache.

[0056] For example, the partition path may include the first partition " "Second Division" "and the third partition" Each partition includes filegroups (different filegroups are identified by filegroup IDs), and each filegroup contains several file slices. Different file slices have different filenames, and the file sizes of the file slices may also differ. For example, the first partition... "It includes file group id1, file group id2, and file group id3. Among them, file group id1 includes file slice 1, file slice 2, and file slice 3."

[0057] Step S202: Based on the data operations involved in the execution of transactions by the data organization format component of the data lake, determine the partition where the record to be performed is located and the file group in the partition.

[0058] In this embodiment, when Hudi executes a transaction, after all operations within a transaction are successfully executed, Hudi saves the transaction's metadata as a flat file in the metadata folder. Therefore, after a transaction is successfully executed, the metadata table in the cache of the distributed database Redis can be updated to update file path change information. For example, based on the partition and file group ID where the file slice is located, a key or its value can be created or updated, and the newly added file slice is appended to the existing key value, while the deleted file slice is removed from the key value.

[0059] For example, when Hudi executes an insert transaction and creates a new file slice, Hudi first determines the partition to which the record belongs based on the partition key. Then, it queries Redis to obtain the existing filegroups, file slice paths, and query information under the current partition. Based on the filegroup information and write policy, it inserts the record into the specified filegroup. At this point, it loads the specified file based on the last file slice filename and partition path in the Redis key-value pair. After processing, the result file is saved to the distributed file system. After the transaction execution is complete, the result filename is appended to the key-value pair of the current filegroup in Redis, and the transaction's metadata information is updated in the Hudi metadata folder.

[0060] Step S203: Obtain the target file path from the metadata table based on the partition where the record to be performed is located and the file group in the partition.

[0061] In this embodiment, the metadata table is stored in a distributed database component (i.e., Redis). The metadata table includes file path information of metadata. When Hudi performs data operations (such as query operations, deletion operations, etc.), it first determines the partition to which the record belongs based on the partition key, and then queries Redis based on the partition to obtain the file location, file slice path and size information under the partition, as the target file path.

[0062] Step S204: Perform data operations in the target file path.

[0063] In this embodiment, when Hudi involves creating and deleting partition folders and file slices, the metadata file under the metadata folder is first loaded to obtain the partition and file group information of the record to be operated on, and the corresponding file path is obtained by using the partition + file group ID key, and the operation is performed directly on the file path.

[0064] This application embodiment uses the distributed database component Redis to maintain the file paths in Hudi's metadata information, thereby avoiding Hudi's large IO consumption on the distributed file system during file read and write operations and reducing the burden on the file system.

[0065] In some embodiments, step S201 can be implemented through the following steps: obtaining the partition path, filegroup, and information of all file slices under the filegroup of the data organization format component of the data lake; constructing associated key-value pairs using the partition path and filegroup as keys and the information of all file slices as key-value pairs; and constructing a metadata table based on the associated key-value pairs. The associated key-value pairs are used as file path information. The data organization format component of the data lake includes different partition paths, and different partition paths include different filegroups. The information of the file slices includes the filename and file size.

[0066] In this embodiment, Hudi uses key-value pairs as the partitioning basis. When a record is written, it is saved to the corresponding folder using a specified key-value pair as the folder name (or created if the folder does not exist). After the Hudi table is created, a corresponding Hudi metadata table is created in the distributed database component Redis, using the partition path... Using the file ID as the key and the information of all file slices under the file group (including file name and file size) as the key-value pairs, a metadata table is constructed, which includes file path information.

[0067] In this embodiment, the partition + file group ID is used as the key, and the file name and file size of all file slices under the corresponding file group are used as the key value. The metadata file information and file group information are maintained in Redis, which enables Hudi to obtain file path information from Redis before executing transactions, avoiding file traversal of the distributed file system, improving file operation efficiency and reducing the burden on the distributed file system.

[0068] Furthermore, based on the above embodiments, in some other embodiments, the method further includes the following steps: determining the information to be updated based on the data operations performed under the target file path; updating the file path information in the metadata table based on the information to be updated. The information to be updated includes at least one of partition update information, file group update information, and file slice update information.

[0069] In this embodiment, data operations may include data insertion, data deletion, data query, data extraction and merging, data rollback, and so on. After all operations within a Hudi transaction are successfully executed, Hudi saves the transaction's metadata as a flat file in the metadata folder. Therefore, after all operations within a transaction are successfully executed, the file path change information can be updated in the metadata table of the distributed database component Redis.

[0070] For example, based on the partition where the file slice is located and the file group ID, create or update keys and their values, append newly added file slices to existing key values, and delete file slices from key values.

[0071] Figure 3 This is a data processing diagram provided for an embodiment of this application, such as... Figure 3 As shown, Redis maintains a metadata table. When performing data operations, it loads the metadata file under the Hudi metadata folder, obtains the partition and filegroup information of the record to be operated on, and retrieves the corresponding file path information from the metadata table using the partition + filegroup ID key. It then directly performs operations on the file path (such as data query operations, data rollback operations, and data commit / merge operations). After the data operation is completed, the corresponding file path changes in the metadata table are updated.

[0072] This application embodiment ensures the accuracy of the metadata table by updating the file path information, so that Hudi can obtain accurate file path information from the metadata table during each transaction execution, thereby improving the accuracy of data processing.

[0073] Furthermore, based on the above embodiments, in some embodiments, the above-mentioned updating of the file path information in the metadata table can be achieved through the following steps: updating the partition path in the file path information according to the partition update information, and / or updating the file group in the partition path according to the file group update information, and / or updating the file slice information in the file group according to the file slice update information.

[0074] In this embodiment, after Hudi completes all transaction operations, the corresponding key-value pairs are updated according to the executed operations. For example, if the operations executed by Hudi include file slice deletion, the file slice information in the key-value pairs of the metadata table needs to be updated. If the operations executed by Hudi include partition addition, the partition path in the key-value pairs of the metadata table needs to be updated (e.g., adding a new partition path). If the operations executed by Hudi include filegroup data merging, the filegroup in the key-value pairs of the metadata table needs to be updated.

[0075] In some embodiments, the above method may further include the following steps: obtaining metadata after the transaction is completed, and updating the metadata files in the metadata folder based on the metadata after the transaction is completed. In this embodiment, after all operations in a Hudi transaction are successfully executed, Hudi will save the metadata of the transaction as flat files in the metadata folder, that is, update the metadata files in the metadata folder to ensure data consistency.

[0076] In other embodiments, the distributed data component can be a remote dictionary service, i.e., Redis. The method may also include the following steps: retrieving data files from the data lake with access volumes exceeding a preset threshold, and caching the data files in the memory of the remote dictionary service.

[0077] In this embodiment, Redis can use memory to store frequently accessed data (i.e., data files whose access volume exceeds a preset threshold). This memory-based storage of frequently accessed data by Redis further improves file operation efficiency.

[0078] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.

[0079] Figure 4 This is a schematic diagram of the metadata processing device provided in an embodiment of this application. This metadata processing device can be integrated into a computer device, or it can be implemented independently of a computer device while cooperating with it. Figure 4 As shown, the metadata processing device 400 includes a data table construction module 410, a partition group determination module 420, a path acquisition module 430, and an operation execution module 440. The data table construction module 410 is used to establish a metadata table corresponding to the data organization format component of the data lake in the distributed database component. The metadata table includes file path information for metadata. The partition group determination module 420 is used to determine the partition where the record to be operated is located and the file group within the partition based on the data operations involved in the data operation when the data organization format component of the data lake executes a transaction. The path acquisition module 430 is used to obtain the target file path from the metadata table based on the partition where the record to be operated is located and the file group within the partition. The operation execution module is used to execute the data operation at the target file path.

[0080] Optionally, the data table construction module can be used to: obtain the partition paths, filegroups, and information on all file slices under each filegroup from the data organization format component of the data lake; construct associated key-value pairs using the partition paths and filegroups as keys and the information on all file slices as key-value pairs; and construct a metadata table based on the associated key-value pairs, which are used as file path information. Specifically, the data organization format component of the data lake includes different partition paths, each containing different filegroups, and the file slice information includes the filename and file size.

[0081] Optionally, the metadata processing device further includes a path update module, used to determine the information to be updated based on the data operations performed under the target file path; and to update the file path information in the metadata table based on the information to be updated. The information to be updated includes at least one of partition update information, file group update information, and file slice update information.

[0082] Optionally, the path update module described above can be used to: update the partition path in the file path information according to the partition update information, and / or update the file group in the partition path according to the file group update information, and / or update the file slice information in the file group according to the file slice update information.

[0083] Optionally, the aforementioned metadata processing device further includes a metadata file update module, used to obtain the metadata after the transaction is completed, and update the metadata file in the metadata folder based on the metadata after the transaction is completed.

[0084] Optionally, the operation execution module can be used to perform at least one of the following operations: data query operation, data rollback operation, and data extraction and merging operation in the target file path.

[0085] Optionally, the distributed database component is a remote dictionary service, and the aforementioned metadata processing device also includes a caching module, which is used to obtain data files with access volume exceeding a preset threshold from the data organization format component of the data lake, and cache the data files in the memory of the remote dictionary service.

[0086] The apparatus provided in this application embodiment can be used to execute the methods in the above embodiments, and its implementation principle and technical effect are similar, so they will not be described again here.

[0087] It should be noted that the division of the various modules in the above device is merely a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, these modules can be implemented entirely in software via processing element calls; they can be fully implemented in hardware; or some modules can be implemented by processing element calls to software, while others are implemented in hardware. For example, the data table construction module can be a separate processing element, or it can be integrated into a chip in the above device. Alternatively, it can be stored as program code in the memory of the above device, and its functions can be called and executed by a processing element. The implementation of other modules is similar. Moreover, these modules can be fully or partially integrated together, or they can be implemented independently. The processing element here can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules can be completed through integrated logic circuits in the hardware of the processor element or through software instructions.

[0088] Figure 5 A schematic diagram of the structure of a computer device provided in an embodiment of this application. For example... Figure 5 As shown, the computer device 500 includes at least one processor 510, a memory 520, a bus 530, and a communication interface 540. The processor 510, communication interface 540, and memory 520 communicate with each other via the bus 530. The communication interface 540 is used to communicate with other devices. This communication interface includes a communication interface for data transmission and a display interface or operation interface for human-computer interaction. The processor 510 executes computer execution instructions stored in the memory, specifically performing the relevant steps in the methods described in the above embodiments. The processor may be a central processing unit, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the computer device may be processors of the same type, such as one or more CPUs; or they may be processors of different types, such as one or more CPUs and one or more ASICs. The memory stores computer execution instructions. The memory may include high-speed RAM and may also include non-volatile memory, such as at least one disk storage device.

[0089] This embodiment also provides a computer-readable storage medium storing computer instructions. When at least one processor of a computer device executes the computer instructions, the computer device performs the metadata processing methods provided in the various embodiments described above.

[0090] This embodiment also provides a computer program product including computer instructions stored in a readable storage medium. At least one processor of a computer device can read the computer instructions from the readable storage medium, and the at least one processor executes the computer instructions to cause the computer device to implement the metadata processing methods provided in the various embodiments described above.

[0091] In this application, "at least one" means one or more, and "more than one" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The character " / " generally indicates an "or" relationship between the preceding and following related objects; in formulas, the character " / " indicates a "division" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0092] It is understood that the various numerical designations used in the embodiments of this application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of this application. In the embodiments of this application, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0093] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.

Claims

1. A metadata processing method, characterized in that, include: In the distributed database component, a metadata table corresponding to the data organization format component of the data lake is established, including: obtaining the partition path, file group, and information of all file slices under the file group of the data organization format component of the data lake. The data organization format component of the data lake includes different partition paths, and different partition paths include different file groups. The information of the file slice includes the file name and file size. Use the combination of the partition path and file group as the key, and the information of all file slices as the key-value pair to construct the associated key-value pair; The metadata table is constructed based on the associated key-value pairs, where the associated key-value pairs are used as file path information, and the metadata table includes file path information containing metadata. Based on the data operations involved in the execution of transactions by the data organization format component of the data lake, determine the partition where the record to be performed for the data operation is located and the file group in the partition; Based on the partition where the record for which the data operation is to be performed is located and the file group in the partition, the target file path is obtained from the metadata table; Perform the data operation at the target file path; Based on the data operation performed under the target file path, information to be updated is determined, including at least one of partition update information, file group update information, and file slice update information; The file path information in the metadata table is updated based on the information to be updated.

2. The method according to claim 1, characterized in that, Updating the file path information in the metadata table includes: Based on the partition update information, the partition path in the file path information is updated, and / or, based on the file group update information, the file group in the partition path is updated, and / or, based on the file slice update information, the file slice information in the file group is updated.

3. The method according to claim 1, characterized in that, After performing the data operation at the target file path, the method further includes: Obtain the metadata after the transaction is completed, and update the metadata files in the metadata folder based on the metadata after the transaction is completed.

4. The method according to claim 1, characterized in that, The data operation performed at the target file path includes: Perform at least one of the following operations under the target file path: data query operation, data rollback operation, and data extraction and merging operation.

5. The method according to claim 1, characterized in that, The distributed database component is a remote dictionary service, and the method further includes: Obtain data files from the data organization format component of the data lake that have been accessed more than a preset threshold, and cache the data files in the memory of the remote dictionary service.

6. A metadata processing apparatus, characterized in that, include: A data table construction module is used to build a metadata table corresponding to the data organization format component of a data lake in a distributed database component. This includes: obtaining information about the partition paths, file groups, and all file slices under the file groups of the data organization format component of the data lake; the data organization format component of the data lake includes different partition paths, and different partition paths include different file groups; the information of each file slice includes the filename and file size; constructing associated key-value pairs using the combination of the partition paths and file groups as keys and the information of all file slices as key-value pairs; and constructing the metadata table based on the associated key-value pairs, where the associated key-value pairs are used as file path information, and the metadata table includes file path information for metadata. The partition group determination module is used to determine the partition where the record to be operated is located and the file group in the partition based on the data operations involved in the data organization format component of the data lake when executing a transaction; The path acquisition module is used to obtain the target file path from the metadata table based on the partition where the record to be performed on the data operation is located and the file group in the partition; An operation execution module is used to perform the data operation under the target file path; The path update module is used to determine the information to be updated based on the data operation performed under the target file path; and to update the file path information in the metadata table based on the information to be updated; the information to be updated includes at least one of partition update information, file group update information, and file slice update information.

7. A computer device, characterized in that, include: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the method as described in any one of claims 1-5.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed by a processor, are used to implement the method as described in any one of claims 1-5.