A method, apparatus, equipment, and medium for determining changes in lake warehouse metadata.

By receiving update requests and generating update description messages through a metadata storage system, the problem of the metadata update detection scheme being difficult to expand and maintain is solved, and unified and reliable update detection is achieved.

CN117931813BActive Publication Date: 2026-06-30BEIJING VOLCANO ENGINE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING VOLCANO ENGINE TECH CO LTD
Filing Date
2023-11-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, metadata update detection schemes are difficult to expand and maintain, especially since they require configuring corresponding update detection logic for each data engine, resulting in a large workload and difficulty in maintenance.

Method used

After receiving an update request from the data engine, the metadata storage system executes the object update processing logic and generates an update description message by comparing the differences between the target object before and after the update. This message is then sent directly to the message receiving object, thus enabling the metadata storage system to perform its own update detection.

Benefits of technology

It enables the metadata storage system to generate update description messages itself, reducing dependence on the data engine, simplifying the expansion and maintenance process, and improving the uniformity and reliability of detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117931813B_ABST
    Figure CN117931813B_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, electronic device, and computer-readable medium for determining metadata changes in a lake warehouse. The method includes: for a metadata storage system, after receiving a metadata processing request sent by a data engine for requesting an update of a target object, executing the object update processing logic corresponding to the metadata processing request; after determining that the object update processing logic has been completed, generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update, so that the update description message can indicate that the target object has been updated, so that the update description message can be sent to a message receiving object subsequently. In this way, the metadata storage system itself can complete the data update detection processing, thereby effectively overcoming the defects caused by data update detection processing performed by other devices.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, device, and medium for determining changes in lake warehouse metadata. Background Technology

[0002] In some application scenarios, users may need to perform certain operations (such as add, delete, modify, etc.) on databases, tables, partitions, etc. under a certain metadata source (such as Hive metadata source) to meet some of the user's metadata needs.

[0003] In addition, in some application scenarios, in order to better reduce the impact of these operations on other businesses, there may be the following requirements: provide the corresponding metadata update description messages for these operations to the relevant businesses so that the businesses can know the metadata update status caused by these operations, thereby effectively avoiding adverse effects caused by the businesses not knowing the metadata update status (such as the inability to execute subsequent tasks in a timely manner or continue to query and access the content before the update).

[0004] However, how to achieve the above requirements is a technical problem that urgently needs to be solved. Summary of the Invention

[0005] To address the aforementioned technical issues, this application provides a method, apparatus, equipment, and medium for determining changes in lake warehouse metadata.

[0006] To achieve the above objectives, the technical solution provided in this application is as follows:

[0007] This application provides a method for determining changes in lake warehouse metadata, the method being applied to a metadata storage system, the method comprising:

[0008] Receive a metadata processing request sent by the data engine, the metadata processing request being used to request an update to a target object; the target object includes one or more metadata;

[0009] Execute the object update processing logic corresponding to the metadata processing request, wherein the object update processing logic is used to update the target object;

[0010] After determining that the object update processing logic has been completed, an update description message corresponding to the target object is generated based on the comparison result between the target object before the update and the target object after the update.

[0011] The update description message is sent to the message receiving object.

[0012] In one possible implementation, the process of generating the update description message includes:

[0013] If the comparison result indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the update description message is generated according to the type of the target object.

[0014] In one possible implementation, if the target object is a table object, then generating the update description message based on the type of the target object includes:

[0015] If the target object is of type table, then the update description message is generated based on the first preset string; the semantic information expressed by the first preset string is to update the object identifier of the table object that belongs to the table type.

[0016] If the target object is of type view, then the update description message is generated based on the second preset string; the semantic information expressed by the second preset string is to update the object identifier of the table object that belongs to the view type.

[0017] In one possible implementation, the target object is used to record multiple metadata;

[0018] The process of generating the update description message includes:

[0019] If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is the same as the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block, then the update description message is generated based on the number of data blocks in the at least one data block.

[0020] In one possible implementation, generating the update description message based on the number of data blocks in the at least one data block includes:

[0021] If the number of data blocks in the at least one data block is 1, then the update description message is generated according to the third preset string; the semantic information expressed by the third preset string is that one of the data blocks is being updated;

[0022] If the number of data blocks in the at least one data block is not less than 2, then the update description message is generated according to the fourth preset string; the semantic information expressed by the fourth preset string is to update multiple data blocks.

[0023] In one possible implementation, the target object is used to record multiple metadata;

[0024] The process of generating the update description message includes at least one of the following:

[0025] If the comparison result indicates that the number of data blocks divided in the target dimension of the updated target object is higher than the number of data blocks divided in the target dimension of the unupdated target object, and the data blocks divided in the target dimension of the updated target object include the data blocks divided in the target dimension of the unupdated target object, then the update description message is generated according to the fifth preset string, and the semantic information expressed by the fifth preset string is to add the data blocks;

[0026] If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is higher than the number of data blocks divided in the target dimension of the updated target object, and the data blocks divided in the target dimension of the target object before the update include the data blocks divided in the target dimension of the updated target object, then the update description message is generated according to the sixth preset string, and the semantic information expressed by the sixth preset string is to delete the data block;

[0027] If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is different from the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block, then the update description message is generated according to the fourth preset string, and the semantic information expressed by the fourth preset string is to update multiple data blocks.

[0028] In one possible implementation, the target object is a library object, a table object, or a partition object.

[0029] In one possible implementation, the message receiving object is a message middleware or at least one downstream object corresponding to the data engine, and the message middleware is used to provide the update description message to each of the downstream objects.

[0030] In one possible implementation, generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes:

[0031] Based on a preset message format and a comparison between the target object before and after the update, an update description message corresponding to the target object is generated; the preset message format is the message format used by the data engine when sending a message to the message receiving object.

[0032] In one possible implementation, the message receiving object is further configured to receive an object update message generated by the data engine in response to the metadata processing request;

[0033] The semantic information carried by the object update message is partially or entirely consistent with the semantic information carried by the update description message.

[0034] In one possible implementation, the metadata processing request is used to describe multiple update processing tasks;

[0035] The execution of the object update processing logic corresponding to the metadata processing request includes:

[0036] Execute the object update processing logic corresponding to each of the aforementioned update processing tasks;

[0037] The process of generating the update description message corresponding to the target object includes:

[0038] For any of the update processing tasks, after determining that the object update processing logic corresponding to the update processing task has been completed, an update description message corresponding to the update processing task is generated based on the comparison result between the target object before the update and the target object after the update.

[0039] Based on the update description messages corresponding to the multiple update processing tasks, an update description message corresponding to the target object is generated.

[0040] In one possible implementation, the metadata processing request is generated by the data engine in response to an object update request sent by the client, the object update request carrying client description information;

[0041] The step of generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes:

[0042] Based on the client description information provided by the data engine and the comparison results between the target object before and after the update, an update description message corresponding to the target object is generated.

[0043] In one possible implementation, generating an update description message corresponding to the target object based on client description information provided by the data engine and a comparison result between the target object before and after the update includes:

[0044] Based on the client description information carried in the metadata processing request and the comparison results between the target object before and after the update, an update description message corresponding to the target object is generated.

[0045] In one possible implementation, before receiving the metadata processing request sent by the data engine, the method further includes:

[0046] Receive client description information sent by the data engine;

[0047] The metadata processing request sent by the receiving data engine includes:

[0048] Receive the metadata processing request corresponding to the client description information sent by the data engine.

[0049] In one possible implementation, after determining that the object update processing logic has been completed, generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes:

[0050] After determining that the object update processing logic has been completed, obtain the logic execution result description information corresponding to the metadata processing request; the logic execution result description information includes some or all of the information carried by the metadata processing request; the some or all of the information includes the object description information of the target object before the update and the object description information of the target object after the update;

[0051] The logical execution result description information is sent to the message generation module embedded in the metadata storage system, so that the message generation module can obtain the target object before the update and the target object after the update based on the logical execution result description information, and generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update.

[0052] This application provides a device for determining changes in lake warehouse metadata, including:

[0053] The first receiving unit is used to receive a metadata processing request sent by the data engine. The metadata processing request is used to request an update to a target object. The target object includes one or more metadata.

[0054] A logic execution unit is used to execute the object update processing logic corresponding to the metadata processing request, wherein the object update processing logic is used to update the target object;

[0055] The message generation unit is used to generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the object update processing logic is determined to be completed.

[0056] The message sending unit is used to send the update description message to the message receiving object.

[0057] This application provides an electronic device, the device comprising: a processor and a memory;

[0058] The memory is used to store instructions or computer programs;

[0059] The processor is configured to execute the instructions or computer program in the memory, so that the electronic device performs the lake warehouse metadata change determination method provided in this application.

[0060] This application provides a computer-readable medium, characterized in that the computer-readable medium stores instructions or computer programs that, when executed on a device, cause the device to perform the lake warehouse metadata change determination method provided in this application.

[0061] This application provides a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the lake warehouse metadata change determination method provided in this application.

[0062] Compared with related technologies, this application has at least the following advantages:

[0063] In the technical solution provided in this application, for a metadata storage system (such as a metadata storage system similar to Hive Metastore), after receiving a metadata processing request sent by a data engine to request the updating of a target object (such as a library object, table object, or partition object), the system executes the object update processing logic corresponding to the metadata processing request (for example, the object update processing logic is used to update the target object). After determining that the object update processing logic has been completed, an update description message corresponding to the target object is generated based on the comparison result between the target object before and after the update. This update description message indicates that the target object has been updated, so that the update description message can be sent to the message receiving object (such as some downstream businesses corresponding to the data engine). In this way, the metadata storage system itself can generate the update description message, thereby enabling the metadata storage system to complete the data update detection processing itself. This effectively overcomes the defects caused by other devices (such as data engines) performing data update detection processing (for example, the difficulty in scaling and maintaining such an update detection scheme due to the need to configure corresponding update detection logic for each data engine). Based on this, it can be seen that the lake warehouse metadata change determination method provided in this application describes a unified, reliable, easily scalable and maintainable real-time update detection scheme. Attached Figure Description

[0064] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0065] Figure 1 A schematic diagram illustrating an update detection scheme provided in an embodiment of this application;

[0066] Figure 2 A flowchart illustrating a method for determining changes in lake warehouse metadata, provided in an embodiment of this application;

[0067] Figure 3 A schematic diagram illustrating another update detection scheme provided in an embodiment of this application;

[0068] Figure 4 A schematic diagram illustrating yet another update detection scheme provided in an embodiment of this application;

[0069] Figure 5 This is a schematic diagram of a device for determining changes in lake warehouse metadata, provided in an embodiment of this application.

[0070] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0071] Research has found that for certain scenarios (such as updating metadata in a data lake warehouse), some relevant update detection schemes can be as follows: when a user updates metadata via a data engine (e.g., ...), ... Figure 1 After data engine 1, data engine 2, ..., or data engine N submits a data update task (e.g., renaming a table), the data engine can call an interface to access a metadata storage system (e.g., HiveMetastore). After completing the data update task with the help of this metadata storage system, it can then enter the hook logic configured within the data engine. This allows the data engine to perform metadata update detection processing through the hook and obtain update description messages (e.g., ...). Figure 1 The metadata update message shown in Figure 1) is provided to downstream businesses, which can effectively avoid the impact on downstream businesses due to the omission of some update description messages.

[0072] Research also revealed the following drawbacks of the update detection scheme described above: For a metadata storage system, the number of data engines connected to it will increase. This necessitates configuring metadata detection logic (e.g., the Hook logic shown above) in each new data engine upon its integration, enabling it to perform metadata update detection for the metadata storage system. This results in a significant workload for this update detection scheme. Furthermore, if adjustments are needed to the metadata detection logic (e.g., adding or modifying content), the metadata detection logic in all data engines already connected to the metadata storage system must be reconfigured. This makes the update detection scheme difficult to scale and maintain.

[0073] Based on the above research, in order to better reduce the expansion and maintenance difficulty of update detection schemes, this application provides a method for determining lake warehouse metadata changes. This method includes: for metadata storage systems (e.g., similar to Hive)... For metadata storage systems like Metastore, after receiving a metadata processing request from a data engine requesting an update of a target object (e.g., a library object, table object, or partition object), the system executes the object update processing logic corresponding to the request (e.g., the object update processing logic is used to update the target object). After confirming the completion of the object update processing logic, an update description message is generated based on the comparison between the target object before and after the update. This update description message indicates that the target object has been updated, allowing it to be sent to the message receiving object (e.g., downstream services of the data engine). This allows the metadata storage system to generate update description messages itself, enabling data update detection processing to be completed by the system itself. This effectively overcomes the shortcomings of data update detection processing performed by the data engine (e.g., the difficulty in scaling and maintaining the update detection scheme due to the need to configure corresponding update detection logic for each data engine).

[0074] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present application.

[0075] To better understand the technical solution provided in this application, the method for determining changes to lake warehouse metadata provided in this application will be explained below with reference to some accompanying drawings. For example... Figure 2 As shown, the lake warehouse metadata change determination method provided in this application embodiment includes S1-S4 below. Wherein, the Figure 2 A flowchart illustrating a method for determining changes in lake warehouse metadata, provided as an embodiment of this application.

[0076] S1: The metadata storage system receives a metadata processing request sent by the data engine. This metadata processing request is used to request an update to a target object. The target object includes one or more metadata.

[0077] The metadata storage system refers to a system used for data management of metadata sources (e.g., adding new metadata, deleting existing metadata, modifying existing metadata, etc.). Furthermore, this application does not limit the implementation method of the metadata storage system. For example, in some application scenarios (e.g., data lake warehouses), the metadata storage system can be implemented using any metadata storage system (e.g., Hive Metastore instances, etc.) to enable data management of metadata in the data lake warehouse. Additionally, this application does not limit the implementation method of the metadata source. For example, it can be implemented using any existing or future metadata source (e.g., data lake, data warehouse, integrated data lake warehouse, etc.).

[0078] Based on the above content, in one possible implementation, the metadata storage system mentioned above can be a metadata storage system (e.g., Figure 3 or Figure 4 The metadata storage system shown is intended to enable data management of metadata from at least some metadata sources. This application does not limit the implementation of the metadata storage system. For example, the metadata storage system can be implemented using any existing or future system capable of managing metadata (e.g., Hive Metastore instance).

[0079] A data engine is an engine that can use the metadata storage system mentioned above to perform certain data processing procedures, so that the data engine can be used to represent the upper-level application of the metadata storage system, thereby enabling the data engine to use the metadata storage system to perform certain data processing procedures in some way (such as interface calls).

[0080] Furthermore, this application does not limit the implementation method of the data engine described above. For example, the data engine can specifically adopt any existing or future engine that can access the metadata storage system (e.g., Figure 3 or Figure 4 The implementation can be carried out using data engines 1, 2, ..., N, etc. For example, in some application scenarios, the data engine can be implemented as a data analysis engine, a data computation engine, or a data query engine. Therefore, in one possible implementation, when the metadata storage system mentioned above is a Hive Metastore instance, the data engine can be implemented using the HiveServer2 engine. This allows the data engine to access the metadata storage system via API calls, enabling the metadata storage system to perform data processing by handling requests provided by the data engine.

[0081] It should be noted that this application does not limit the implementation of the HiveServer2 engine mentioned above. For example, the HiveServer2 engine may include one or more of the following: at least one Structured Query Language (SQL) engine, at least one batch stream processing engine, and at least one intelligent analysis platform engine.

[0082] It should also be noted that this application does not limit the implementation method of the SQL engine mentioned above. For example, the SQL engine can be implemented using any existing or future SQL engine (such as Hive or Presto). Similarly, this application does not limit the implementation method of the batch / stream processing engine mentioned above. For example, the batch / stream processing engine can be implemented using any existing or future batch / stream processing engine (such as Spark or Flink). Furthermore, this application does not limit the implementation method of the intelligent analysis platform engine mentioned above. For example, the intelligent analysis platform engine can be implemented using any existing or future intelligent analysis platform engine, such as a Business Intelligence (BI) analysis platform engine.

[0083] Furthermore, regarding the aforementioned metadata storage system and data engine, data communication is possible between the data engine and the metadata storage system. Moreover, this application does not limit the communication method; for example, it can be implemented using any existing or future method capable of enabling communication between the data engine and the metadata storage system. Additionally, in some application scenarios, the data engine can access the metadata storage system by calling an interface, enabling the metadata storage system to process metadata processing requests (e.g., requests to modify table names) sent by the data engine through that interface.

[0084] Metadata processing request refers to a request sent by the data engine to the metadata storage system to request an update to the target object (e.g., table name modification, deletion of a column of metadata, addition of a new column of metadata, etc.). Moreover, this application does not limit the implementation of the metadata processing request. For example, it can be implemented using any existing or future request sent by the data engine to the metadata storage system to request a certain data processing.

[0085] The target object refers to the object being processed when the data engine performs an update using a metadata storage system; and the target object includes one or more metadata. Furthermore, this application does not limit the target object; for example, in some application scenarios, the target object may be a library object, a table object, a partition object, or a data object. The library object includes a large number of table objects; and this application does not limit the implementation method of the library object; for example, the library object may be implemented using a database containing a large number of tables. The table object includes a large amount of metadata; and this application does not limit the table object; for example, the table object may be implemented using a table or view containing a large amount of metadata. The data object is used to represent certain metadata (e.g., metadata recorded in a certain table of a certain database).

[0086] Furthermore, this application does not limit the implementation method of the aforementioned data engine obtaining metadata processing requests. For example, in some application scenarios, if a user can send a request to the data engine using their client, the specific process by which the data engine obtains the metadata processing request can be as follows: in response to the object update request sent by the client, the metadata processing request is generated so that it can be subsequently sent to the aforementioned metadata storage system through a certain method (e.g., API call). The client is used to implement the interaction process with the user, enabling the client to generate corresponding requests (e.g., the object update request) in response to certain operations triggered by the user, so that the client can subsequently send the object update request to the data engine. The object update request refers to the request sent by the client to the data engine to request update processing for the target object; and this application does not limit the method of generating the object update request.

[0087] Based on the above, in one possible implementation, when the user's client can communicate with the aforementioned data engine, and the data engine can communicate with the aforementioned metadata storage system, the interaction process involving these three parties can be as follows: When the client detects a user operation (e.g., the user inputs the SQL query "alter table test_db.test_tbl rename to test_db.test_tbl_new"), the client can generate an object update request based on the user operation, so that the object update request can represent the data processing requirement (e.g., modifying the table name) conveyed by the user operation; then the client sends the object update request to the corresponding data engine (e.g., Spark batch stream processing engine), so that the data engine can generate a metadata processing request based on the object update request, so that the metadata processing request can express the semantic information carried by the object update request in a format that the metadata storage system can parse; then, the data engine uses a certain method (e.g., calling Hive) to... The metadata processing request is sent to the metadata storage system via the Metastore's alter_table() interface, so that the metadata storage system can fulfill the data processing requirements (such as changing a table name) conveyed by the user's operation by processing the metadata processing request.

[0088] It should be noted that this application does not limit the association between the object update request and the metadata processing request in the above two paragraphs. For example, in some application scenarios, the semantic information carried by the object update request and the semantic information carried by the metadata processing request may be partially or completely consistent. Furthermore, this application does not limit the difference between the object update request and the metadata processing request. For example, the request format used by the object update request may differ from the request format used by the metadata processing request. Specifically, the request format used by the object update request refers to the request format required when the client communicates with the data engine, so that the data engine can correctly parse the object update request according to this request format. The request format used by the metadata processing request refers to the request format required when the data engine communicates with the metadata storage system, so that the metadata storage system can correctly parse the metadata processing request according to this request format.

[0089] Based on the relevant content of S1 above, it can be seen that for the metadata storage system mentioned above (e.g., Figure 3 or Figure 4In the case of a metadata storage system (such as the one shown), when the metadata storage system and the data engine can communicate with each other, the metadata storage system can receive metadata processing requests sent by the data engine, so that the metadata storage system can subsequently perform update processing (such as modifying the table name) on the target object (such as a table) by processing the metadata processing request.

[0090] S2: The metadata storage system executes the object update processing logic corresponding to the metadata processing request. This object update processing logic is used to update the target object.

[0091] The object update processing logic corresponding to the metadata processing request refers to the logic that the metadata storage system needs to execute when processing the metadata processing request, so that the metadata storage system can update the target object by executing the object update processing logic.

[0092] Furthermore, this application does not limit the implementation of the object update processing logic corresponding to the above metadata processing request. For example, it may refer to the logic pre-configured for the above metadata storage system for processing the metadata processing request.

[0093] Based on the relevant content of S2 above, it can be seen that for the metadata storage system mentioned above (e.g., Figure 3 or Figure 4 For the metadata storage system (e.g., the one shown), if the metadata storage system and the data engine can communicate, then after the metadata storage system receives a metadata processing request (e.g., an alter_table request) sent by the data engine, the metadata storage system can implement the processing of the metadata processing request by executing the object update processing logic corresponding to the metadata processing request. After the metadata processing request is completed, the metadata storage system will automatically generate an update description message corresponding to the metadata processing request so that the update description message can indicate what update processing was performed on the target object during the processing of the metadata processing request.

[0094] S3: After the object update processing logic is completed, the metadata storage system generates an update description message corresponding to the target object based on the comparison results between the target object before and after the update.

[0095] The target object before the update refers to the target object before the execution of the object update processing logic corresponding to the metadata processing request mentioned above. For example, when the metadata processing request is determined based on the SQL query statement "alter table test_db.test_tbl rename to test_db.test_tbl_new", the target object before the update refers to the table object (e.g., a table or view) with the table name "test_tbl" within the database object with the database name "test_db". It should be noted that this application does not limit the method of obtaining the target object before the update. For example, it can be implemented using any existing or future method that enables the metadata storage system to obtain a specific object before the update.

[0096] The updated target object refers to the target object after the object update processing logic corresponding to the metadata processing request above has been executed. For example, when the metadata processing request is determined based on the SQL query statement "alter table test_db.test_tblrename to test_db.test_tbl_new" above, the updated target object refers to the table object (e.g., a table or view) with the table name "test_tbl_new" within the database object with the database name "test_db". It should be noted that this application does not limit the method of obtaining the updated target object. For example, it can be implemented using any existing or future method that enables the metadata storage system to obtain a certain updated object.

[0097] Furthermore, for the aforementioned metadata storage system, after obtaining the target object before the update and the target object after the update, it can compare the target object before the update with the target object after the update to obtain the comparison result between the target object before the update and the target object after the update. This comparison result can at least be used to indicate the differences between the target object before the update and the target object after the update (e.g., different table names, different data columns, etc.). This allows the update description message corresponding to the target object to be determined based on the comparison result. The update description message can indicate what kind of update processing was performed on the target object when the metadata storage system processed the aforementioned metadata processing request. In this way, the update detection processing can be performed by the metadata storage system itself.

[0098] The update description message corresponding to the target object is used to describe what kind of update processing was performed on the target object when the metadata storage system processed the metadata processing request. Moreover, this application does not limit the implementation of the update description message corresponding to the target object. For example, if the target object is a table, and the metadata storage system executed the relevant logic of updating the table name of the target object when processing the metadata processing request, then the update description message corresponding to the target object can be determined based on the ALTERTABLE_RENAME event so that the update description message can express the semantic information of "updating the table name of the target object".

[0099] Furthermore, this application does not limit the generation method of the update description message corresponding to the target object. For example, in some application scenarios, if the metadata processing request can accurately represent what kind of update processing has been performed on the target object, the generation process of the update description message corresponding to the target object can be as follows: the metadata storage system converts the metadata processing request according to a preset message format to obtain the update description message corresponding to the target object, so that the update description message can represent the semantic information carried by the metadata processing request in the preset message format, so that the metadata storage system can subsequently send the update description message to the message receiving object, so that the message receiving object can accurately parse the message according to the preset message format. Here, the preset message format refers to the message format pre-configured for the communication process between the metadata storage system and the message receiving object, so that the message receiving object can accurately parse the message sent by the metadata storage system. The message receiving object refers to the recipient of the update description message; and this application does not limit the implementation method of the message receiving object. For example, the message receiving object can be at least one downstream object corresponding to the message middleware or the data engine. The message middleware retrieves messages from the metadata storage system and provides them to at least one downstream object corresponding to the data engine. This allows the message middleware to act as a message relay, effectively avoiding the resource overhead caused by the metadata storage system directly sending messages to each downstream object. Furthermore, this application does not limit the implementation of the message middleware; for example, it can be implemented using any existing or future object capable of message relay (e.g., a message queue). The downstream object refers to an object (e.g., a business function) that needs to continue performing certain tasks based on the update description message. This application also does not limit the implementation of the downstream object. For example, in some application scenarios (e.g., scenarios requiring real-time display of data updates to users), the downstream object can be used to display the update description message to the corresponding users in a certain way, so that these users can promptly learn what updates have occurred in the data source managed by the metadata storage system.

[0100] Furthermore, this application does not limit the implementation of the preset message format in the above paragraph. For example, it can be implemented using any existing or future message that can realize the communication process between the metadata storage system and the message receiving object.

[0101] Research has revealed that in certain application scenarios (for example, when HiveServer2 can perform update detection, generate, and send corresponding messages using Hooks in relevant solutions, the message format sent by the HiveServer2 Hook has become a de facto standard, and the downstream services corresponding to the HiveServer2 have been connected according to this standard), if the message format has already been configured for the aforementioned message receiving object, then in order to minimize the transformation cost, the update description message generated by the aforementioned metadata storage system can follow the message format that has already been configured for the message receiving object (e.g., the message format sent by the Hook).

[0102] Based on the above, in one possible implementation, if the message receiving object has already been configured with the message format required for communication with the data engine, the preset message format can be implemented using the message format used when the data engine sends a message to the message receiving object (e.g., the message format sent by the Hook). There is no need to configure a new message format for the message receiving object. This allows the metadata storage system to generate and provide updated description messages without modifying the message receiving object, thereby reducing modification costs.

[0103] In practice, in some application scenarios (for example, when the metadata storage system mentioned above is Hive Metastore, and the data engine mentioned above is an upper-layer application of Hive Metastore), the metadata processing requests received by the metadata storage system may correspond to multiple event types. This makes it impossible for the metadata storage system to accurately determine from the metadata processing request what specific update processing was performed on the target object. For example, when the metadata processing request is an `alter_table` request as shown in Table 1 below, it may correspond to multiple data update operations (such as the `ALTERTABLE_RENAME` event for modifying the table name, the `ALTERTABLE_ADDCOLS` event for adding a column, the `ALTERVIEW_RENAME` event for modifying the view name, etc.). Therefore, the metadata storage system can only obtain the general information that the table object is being updated from the metadata processing request, but cannot accurately determine what specific update processing was performed on the table object.

[0104]

[0105] Table 1. Requests Received by Hive Metastore and Their Definitions

[0106] In the scenario described above, to further improve the accuracy of the update description message, this application also provides a possible implementation of the process for generating the update description message corresponding to the target object. In this implementation, if the metadata processing request cannot accurately indicate what kind of update processing the metadata storage system has performed on the target object, the specific process for generating the update description message corresponding to the target object can be as follows: based on the comparison results between the target object before the update and the target object after the update, an update description message corresponding to the target object is generated so that the update description message can accurately indicate what kind of update processing has been performed on the target object, which is beneficial to improving the update detection effect.

[0107] Furthermore, this application does not limit the implementation of the step "generating an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update" in the above paragraph. For example, in some application scenarios, this step can specifically be: converting the comparison result according to the preset message format above to obtain the update description message corresponding to the target object, so that the update description message can represent the object update process conveyed by the comparison result.

[0108] In addition, in some application scenarios, in order to better improve the message generation effect, this application also provides a possible implementation of the step of "generating an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update" mentioned above. In this implementation, the generation process of the update description message may include at least one of the steps 11-15 below.

[0109] Step 11: If the comparison results above indicate that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the metadata storage system above will generate an update description message corresponding to the target object based on the type of the target object.

[0110] The object identifier of the target object before the update is used to uniquely identify the target object before the update; and this application does not limit the implementation of the object identifier, for example, it can be implemented using the object name. It can be seen that, in one possible implementation, the object identifier of the target object before the update can be the object name of the target object before the update (for example, if the target object before the update is an old table, then the object identifier of the target object before the update can be the table name of the old table, etc.).

[0111] The updated object identifier is used to uniquely identify the updated target object; and this application does not limit the implementation of the object identifier, for example, it can be implemented using the object name. Therefore, in one possible implementation, the updated object identifier can be the updated object name (for example, if the updated target object is a new table, then the updated object identifier can be the table name of the new table, etc.).

[0112] The type of the target object is used to indicate the type to which the target object belongs; and this application does not limit the implementation of the type of the target object. For example, when the target object is a table object, if the target object records multiple metadata in a table format, then the type of the target object is a table; if the target object records multiple metadata in a view format, then the type of the target object is a view.

[0113] Furthermore, this application does not limit the implementation of step 11 above. For example, when the target object above is a table object, step 11 may specifically include steps 111-112 below.

[0114] Step 111: If the target object above is of type table, and the comparison result above indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the metadata storage system above generates an update description message corresponding to the target object based on the first preset string; the semantic information expressed by the first preset string is to update the object identifier of the table object that belongs to the table type.

[0115] The first preset string refers to a pre-defined string used to express the semantic information of "updating the object identifier of a table object belonging to the table type"; and this application does not limit the implementation of the first preset string. For example, the first preset string can be ALTERTABLE_RENAME.

[0116] Furthermore, this application does not limit the implementation of step 111 above. For example, in some application scenarios, step 111 may specifically be: directly using the first preset string above as the update description message corresponding to the target object.

[0117] In addition, in some application scenarios, in order to further improve the message generation effect, this application also provides a possible implementation of step 111 above. In this implementation, step 111 may specifically include steps 1111-1112 below.

[0118] Step 1111: If the type of the target object above is a table type, and the comparison result above indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the metadata storage system above generates the event type corresponding to the target object based on the first preset string.

[0119] The event type corresponding to the target object refers to the type of event triggered when the metadata storage system updates the target object according to the object update processing logic corresponding to the metadata processing request.

[0120] Furthermore, this application does not limit the implementation of step 1111 above. For example, step 1111 can specifically be: if the type of the target object above is a table type, and the comparison result above indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the first preset string (e.g., the string ALTERTABLE_RENAME) is used as the event type corresponding to the target object, so that the event type can indicate that when the metadata storage system above updates the target object according to the object update processing logic corresponding to the metadata processing request above, the event of table name modification is triggered.

[0121] Step 1112: After the metadata storage system obtains the update basic information corresponding to the target object, the metadata storage system generates an update description message corresponding to the target object based on the update basic information and the event type corresponding to the target object.

[0122] The update basic information corresponding to the target object refers to some information that the metadata storage system can directly obtain after the metadata storage system completes the update process for the target object (such as the old table name, the new table name, etc.).

[0123] Furthermore, this application does not limit the implementation method of the updated basic information corresponding to the target object mentioned above. For example, the updated basic information corresponding to the target object may include the object description information of the target object before the update and the object description information of the target object after the update. The object description information of the target object before the update is used to describe the target object before the update; and this application does not limit the implementation method of the object description information of the target object before the update. For example, it may at least include the object identifier of the target object before the update (e.g., the table name of the old table). The object description information of the target object after the update is used to describe the target object after the update; and this application does not limit the implementation method of the object description information of the target object after the update. For example, it may at least include the object identifier of the target object after the update (e.g., the table name of the new table) and at least one of the target object after the update (e.g., the new table itself). For another example, if the target object is a table object, the updated basic information corresponding to the target object may also include the database name corresponding to the target object, so that the database name can represent the name of the database that includes the target object.

[0124] Furthermore, this application does not limit the method of obtaining the update basic information corresponding to the target object mentioned above. For example, in some application scenarios, the update basic information corresponding to the target object can be extracted by the metadata storage system from the metadata processing request mentioned above. As another example, in some application scenarios, the update basic information corresponding to the target object can refer to the logical execution result description information corresponding to the metadata processing request. Here, the logical execution result description information refers to the information automatically obtained by the metadata storage system after the metadata storage system has completed executing the object update processing logic corresponding to the metadata processing request, which describes the execution status of the object update processing logic corresponding to the metadata processing request.

[0125] Furthermore, this application does not limit the implementation of step 1112 above. For example, it can be implemented using any existing or future method that can automatically merge multiple pieces of information into one piece of information.

[0126] Based on the relevant content of steps 1111 to 1112 above, it can be seen that in some application scenarios, the update description message corresponding to the target object can be determined by the metadata storage system based on the update basic information of the target object, the type of the target object, and the comparison results between the target object before and after the update, so that the update description message can more accurately indicate what kind of update processing has been performed on the target object.

[0127] Based on the relevant content of step 111 above, when the target object is a table object, if the type of the target object is table type, it can be determined that the target object is a table used to record a large amount of data. Therefore, when it is determined that the object identifier of the target object before the update (e.g., the old table name) is different from the object identifier of the target object after the update (e.g., the new table name), it can be determined that the object identifier of the target object has been updated. At this time, the update description message corresponding to the target object can be determined according to the first preset string (e.g., the string ALTERTABLE_RENAME), so that the update description message can indicate that the object identifier update processing is performed for the target object.

[0128] Step 112: If the type of the target object above is a view type, and the comparison result above indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the metadata storage system above generates an update description message corresponding to the target object based on the second preset string; the semantic information expressed by the second preset string is to update the object identifier of the table object that belongs to the view type.

[0129] The second preset string refers to a pre-defined string used to express the semantic information of "updating the object identifier of a table object belonging to the view type"; and this application does not limit the implementation of the second preset string. For example, the second preset string can be ALTERVIEW_RENAME.

[0130] Furthermore, this application does not limit the implementation of step 112 above. For example, the implementation of step 112 is similar to the implementation of step 111 above, and for the sake of brevity, it will not be described again here.

[0131] Based on the relevant content of step 112 above, when the target object is a table object, if the type of the target object is a view type, it can be determined that the target object is a view used to record a large amount of data. Therefore, when it is determined that the object identifier of the target object before the update (e.g., the name of the old view) is different from the object identifier of the target object after the update (e.g., the name of the new view), it can be determined that the object identifier of the target object has been updated. At this time, the update description message corresponding to the target object can be determined according to the pre-set second preset string (e.g., the string ALTERVIEW_RENAME), so that the update description message can indicate that the object identifier update processing is performed for the target object.

[0132] Based on the relevant content of step 11 above, in some application scenarios, for the target object mentioned above, when the target object is a table object, since the table object can include multiple types of objects (such as tables and views), after determining that the object identifier of the target object before the update is different from the object identifier of the target object after the update, an update description message corresponding to the target object can be generated according to the type of the target object, so that the update description message can more accurately indicate what kind of update processing has been performed on the target object.

[0133] Step 12: If the target object mentioned above is used to record multiple metadata, and the comparison results above indicate that the number of data blocks divided in the target dimension of the target object before the update is the same as the number of data blocks divided in the target dimension of the updated target object, and there is a difference between the target object before the update and the target object after the update in at least one data block, then the metadata storage system mentioned above generates an update description message corresponding to the target object based on the number of data blocks in the at least one data block.

[0134] The target dimension refers to the dimension required for dividing data blocks based on the target object mentioned above. Moreover, this application does not limit the implementation method of the target dimension. For example, if the target object is a library object, the target dimension can be a table dimension or a library partition dimension, etc.; if the target object is a table dimension, the target dimension can be a column dimension or a table partition dimension, etc.; if the target object is a partition object, the target dimension can be a column dimension, etc.

[0135] Furthermore, this application does not limit the implementation of step 12 above. For example, step 12 can specifically be: if the target object above is used to record multiple metadata, and the comparison result above indicates that the number of data blocks divided by the target object before the update under the target dimension is the same as the number of data blocks divided by the target object after the update under the target dimension, and there is a difference between the target object before the update and the target object after the update in at least one data block, then an update description message corresponding to the target object is generated based on the number of data blocks of the at least one data block, so that the update description message carries the number of data blocks of the at least one data block.

[0136] For example, in some application scenarios, in order to better improve the message generation effect, step 12 above may specifically include steps 121-122 below.

[0137] Step 121: If the target object mentioned above is used to record multiple metadata, and the comparison result mentioned above indicates that the number of data blocks divided in the target dimension of the target object before the update is the same as the number of data blocks divided in the target dimension of the updated target object, there is a difference between the target object before the update and the target object after the update in at least one data block, and the number of data blocks in the at least one data block is 1, then an update description message corresponding to the target object is generated according to the third preset string; the semantic information expressed by the third preset string is that an update is performed on a data block.

[0138] The third preset string refers to a pre-defined string used to express the semantic information of "updating a data block"; and this application does not limit the implementation of the third preset string. For example, the third preset string can be ALTERTABLE_RENAMECOL.

[0139] Furthermore, this application does not limit the implementation of step 121 above. For example, the implementation of step 121 is similar to the implementation of step 111 above, and for the sake of brevity, it will not be described again here.

[0140] Based on the relevant content in step 121 above, in some application scenarios, when the target object is a table and the target dimension is a column dimension, if the comparison result indicates that the number of data blocks divided in the column dimension of the table before the update (that is, the number of data columns in the table before the update) is the same as the number of data blocks divided in the column dimension of the table after the update (that is, the number of data columns in the table after the update), and there is only one data column that is different between the table before the update and the table after the update, then an update description message corresponding to the target object can be generated based on the third preset string (such as the string ALTERTABLE_RENAMECOL). This update description message can indicate that an update process is performed on a certain data column in the target object (such as updating the column name, updating some or all of the data in the data column, etc.).

[0141] Step 122: If the target object mentioned above is used to record multiple metadata, and the comparison result mentioned above indicates that the number of data blocks divided in the target dimension of the target object before the update is the same as the number of data blocks divided in the target dimension of the updated target object, and there is a difference between the target object before the update and the target object after the update in at least one data block, and the number of data blocks in the at least one data block is not less than 2, then an update description message corresponding to the target object is generated according to the fourth preset string; the semantic information expressed by the fourth preset string is to update multiple data blocks.

[0142] The fourth preset string refers to a pre-defined string used to express the semantic information of "updating multiple data blocks"; and this application does not limit the implementation of the fourth preset string. For example, the fourth preset string can be ALTERTABLE_REPLACECOLS.

[0143] Furthermore, this application does not limit the implementation of step 122 above. For example, the implementation of step 122 is similar to the implementation of step 111 above, and for the sake of brevity, it will not be described again here.

[0144] Based on the relevant content of step 122 above, in some application scenarios, when the target object is a table and the target dimension is a column dimension, if the comparison result above indicates that the number of data blocks divided in the column dimension of the table before the update (that is, the number of data columns in the table before the update) is the same as the number of data blocks divided in the column dimension of the table after the update (that is, the number of data columns in the table after the update), and there are multiple data columns that are different between the table before the update and the table after the update, then an update description message corresponding to the target object can be generated based on the third preset string (such as the string ALTERTABLE_REPLACECOLS), so that the update description message can indicate that update processing is performed on certain data columns in the target object (such as updating column names, updating some or all data in the data column, etc.).

[0145] Based on the relevant content in step 12 above, in some application scenarios, for the target object mentioned above, when the target object is a table and the target dimension is a column dimension, if the number of data columns in the table before the update is the same as the number of data columns in the table after the update, an update description message corresponding to the target object can be generated based on the number of different data columns between the table before the update and the table after the update, so that the update description message can more accurately indicate what kind of update processing was performed on the target object.

[0146] Step 13: If the target object mentioned above is used to record multiple metadata, and the comparison result mentioned above indicates that the number of data blocks divided in the target dimension of the updated target object is higher than the number of data blocks divided in the target dimension of the original target object, and the data blocks divided in the target dimension of the updated target object include the data blocks divided in the target dimension of the original target object, then an update description message corresponding to the target object is generated according to the fifth preset string, and the semantic information expressed by the fifth preset string is the addition of data blocks.

[0147] The fifth preset string refers to a pre-defined string used to express the semantic information of "adding a data block"; and this application does not limit the implementation of the fifth preset string. For example, the fifth preset string can be ALTERTABLE_ADDCOLS.

[0148] Furthermore, this application does not limit the implementation of step 13 above. For example, the implementation of step 13 is similar to the implementation of step 111 above, and for the sake of brevity, it will not be described again here.

[0149] Based on the relevant content of step 13 above, in some application scenarios, when the target object is a table and the target dimension is a column dimension, if the comparison result above indicates that the number of data columns in the updated table is higher than the number of data columns in the table before the update, and the data columns in the updated table include the data columns in the table before the update, then it can be determined that the updated table has added some data columns compared to the table before the update. Therefore, an update description message corresponding to the target object can be generated based on the fifth preset string, so that the update description message can indicate that some data columns have been added to the target object.

[0150] Step 14: If the target object mentioned above is used to record multiple metadata, and the comparison result mentioned above indicates that the number of data blocks divided by the target object before the update under the target dimension is higher than the number of data blocks divided by the target object after the update under the target dimension, and the data blocks divided by the target object before the update under the target dimension include the data blocks divided by the target object after the update under the target dimension, then an update description message corresponding to the target object is generated according to the sixth preset string, and the semantic information expressed by the sixth preset string is the deletion of data blocks.

[0151] The sixth preset string refers to a pre-defined string used to express the semantic information of "deleting data blocks"; and this application does not limit the implementation of the sixth preset string. For example, the sixth preset string can be ALTERTABLE_DELCOLS.

[0152] Furthermore, this application does not limit the implementation of step 14 above. For example, the implementation of step 14 is similar to the implementation of step 111 above, and for the sake of brevity, it will not be described again here.

[0153] Based on the relevant content of step 14 above, in some application scenarios, when the target object is a table and the target dimension is a column dimension, if the comparison result above indicates that the number of data columns in the table before the update is higher than the number of data columns in the table after the update, and the data columns in the table before the update include the data columns in the table after the update, then it can be determined that the updated table has deleted some data columns relative to the table before the update. Therefore, an update description message corresponding to the target object can be generated based on the sixth preset string so that the update description message can indicate that some data columns have been deleted from the target object.

[0154] Step 15: If the target object mentioned above is used to record multiple metadata, and the comparison results mentioned above indicate that the number of data blocks divided in the target dimension of the target object before the update is different from the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block, then an update description message corresponding to the target object is generated according to the fourth preset string. The semantic information expressed by the fourth preset string is that multiple data blocks are updated.

[0155] In this application, under certain application scenarios, when the target object is a table and the target dimension is a column dimension, if the comparison result indicates that the number of data columns in the table before the update is different from the number of data columns in the table after the update, and some data columns in the updated table are different from some data columns in the table before the update, then a fourth preset string can be used to generate an update description message corresponding to the target object, so that the update description message can indicate that some data columns in the target object have been updated. It should be noted that the relevant content of the fourth preset string can be found in step 122 above.

[0156] Based on the relevant content of steps 11 to 15 above, in some application scenarios, for the target object mentioned above, after the metadata storage system completes the update processing for the target object, it can first obtain the target object before the update and the target object after the update; then, based on the comparison results between the target object before the update and the target object after the update, it can generate an update description message corresponding to the target object, so that the update description message can more accurately represent what kind of update processing has been performed on the target object.

[0157] Based on the above information regarding the update description message corresponding to the target object, in one possible implementation, the generation process of the update description message for the target object can be as follows: Based on a preset message format and a comparison between the target object before and after the update, an update description message is generated for the target object. This allows the update description message to indicate the update process performed on the target object according to the preset message format. The preset message format is the message format used by the data engine when sending messages to the message receiving object, allowing the message receiving object to continue using its previous message parsing method, thus reducing modification costs.

[0158] Based on the relevant content of S3 above, for the metadata storage system, after the metadata storage system determines that the object update processing logic corresponding to the metadata processing request has been completed, the metadata storage system can generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update. This message can indicate that the target object has been updated, so that the update description message can be provided to the corresponding business (such as some downstream businesses corresponding to the data engine).

[0159] S4: The metadata storage system sends the update description message to the message receiving object.

[0160] For details regarding the message receiving object, please refer to the above text; for the sake of brevity, it will not be repeated here.

[0161] As can be seen, for the metadata storage system mentioned above, after receiving a metadata processing request from the data engine requesting an update of the target object, the metadata storage system not only needs to process the request and generate an update description message corresponding to the target object, but also needs to provide the update description message to at least one downstream object corresponding to the data engine in a certain way, so that these downstream objects can obtain these update description messages in a timely manner. This can effectively avoid adverse effects caused by these downstream objects missing some messages.

[0162] Based on the content of S1 to S4 above, for a metadata storage system (such as a metadata storage system like HiveMetastore), after receiving a metadata processing request sent by a data engine to request the updating of a target object (such as a library object, table object, or partition object), the system executes the object update processing logic corresponding to the metadata processing request (for example, the object update processing logic is used to update the target object). After determining that the object update processing logic has been completed, an update description message corresponding to the target object is generated based on the comparison between the target object before and after the update. This update description message indicates that the target object has been updated, so that it can be sent to the message receiving object (such as some downstream businesses corresponding to the data engine). In this way, the metadata storage system itself can generate update description messages, thereby enabling the metadata storage system to complete the data update detection processing itself. This effectively overcomes the defects caused by the data engine performing data update detection processing (such as the difficulty in scaling and maintaining the update detection scheme because corresponding update detection logic needs to be configured for each data engine).

[0163] Research has revealed that in some application scenarios, metadata storage systems (such as Hive Metastore) may use methods (e.g., whitelisting) to configure permissions for certain data engines (e.g., Spark). This allows these data engines to skip the authentication process for users after completing it themselves, thus preventing them from providing user-related information to the metadata storage system. Consequently, the metadata storage system can only perceive information about these data engines, but not about the user who triggered the data update process. This means the update description message provided by the metadata storage system cannot carry user information, preventing downstream businesses from identifying the trigger of the data update process and thus affecting the effectiveness of data update detection.

[0164] Based on the above research, to better improve the update detection effect, the communication process between the metadata storage system and the data engine can be further modified so that the metadata storage system can smoothly obtain user information from the data engine. Based on this, this application also provides a possible implementation of S3 above. In this implementation, when the metadata processing request is generated by the data engine in response to an object update request sent by the client, and the object update request carries client description information, S3 can specifically be: based on the client description information provided by the data engine and the comparison result between the target object before and after the update, an update description message corresponding to the target object is generated, so that the update description message carries the client description information. This allows the update description message to not only indicate what kind of update processing has been performed on the target object, but also who triggered this update processing, so that subsequent downstream objects can obtain more comprehensive information from the update description message.

[0165] The client description information refers to the trigger used to describe the update process described by the metadata processing request above.

[0166] Furthermore, this application does not limit the implementation method of the client description information mentioned above. For example, in some application scenarios, when the metadata processing request mentioned above is generated by the data engine in response to an object update request sent by the client, the client description information may include the user identifier corresponding to the client. The user identifier is used to uniquely identify the user of the client; and this application does not limit the implementation method of the user identifier. For example, the user identifier may be at least one of a login account on the client and a client identifier of the client. The client identifier is used to uniquely identify the client.

[0167] Furthermore, the way the data engine obtains client description information in this application is as follows: for example, when the metadata processing request is generated by the data engine in response to an object update request sent by the client, and the object update request carries client description information, the data engine can directly extract the client description information from the object update request.

[0168] Furthermore, the method by which the data engine provides client description information to the metadata storage system as described above in this application will be explained below with two examples for ease of understanding.

[0169] Example 1: In some application scenarios, the data engine can insert the client description information into a free field in the metadata processing request, so that the metadata processing request carries the client description information. This allows the data engine to provide the client description information to the metadata storage system during the subsequent sending of the metadata processing request. This effectively improves the communication efficiency between the data engine and the metadata storage system, thereby improving data update efficiency and shortening the response time to user requests.

[0170] Based on the above content, in some application scenarios, when the metadata processing request carries client description information, S3 can specifically be: based on the client description information carried in the metadata processing request and the comparison result between the target object before and after the update, generate an update description message corresponding to the target object, so that the update description message carries the client description information. In this way, the data engine can provide as much information as possible to the metadata storage system through a single data communication process, thereby improving efficiency.

[0171] Example 2: In some application scenarios, in order to minimize the transformation cost, a secondary communication method can be used to realize the data communication process between the data engine and the metadata storage system. This allows the metadata storage system to send the client description information and the metadata processing request in two separate parts to the data engine. This can effectively avoid the cost of modifying the data communication protocol between the data engine and the metadata storage system (that is, the protocol required to send the metadata processing request), thereby helping to reduce the transformation cost.

[0172] Based on Example 2 above, this application also provides a possible implementation of the above-described method for determining changes in lake warehouse metadata. In this implementation, the method for determining changes in lake warehouse metadata may include at least steps 21-24 below.

[0173] Step 21: The metadata storage system receives the client description information sent by the data engine.

[0174] In this application, for the aforementioned data engine and the aforementioned metadata storage system, after the data engine receives an object update request triggered by the user through a client, the data engine can obtain client description information from the object update request so that the client description information can represent the relevant information of the user; then the data engine sends the client description information to the metadata storage system so that the metadata storage system can subsequently use the client description information to generate an update description message.

[0175] Step 22: The metadata storage system receives the metadata processing request corresponding to the client description information sent by the data engine.

[0176] It should be noted that for the relevant content of step 22 above, please refer to the relevant content of S1 above.

[0177] Step 23: The metadata storage system executes the object update processing logic corresponding to the metadata processing request. This object update processing logic is used to update the target object.

[0178] It should be noted that for the relevant content of step 23 above, please refer to the relevant content of S2 above.

[0179] Step 24: After the object update processing logic is completed, the metadata storage system generates an update description message corresponding to the target object based on the comparison results between the target object before and after the update.

[0180] It should be noted that for the relevant content of step 24 above, please refer to the relevant content of S3 above.

[0181] Based on the content of steps 21 to 24 above, in some application scenarios, for data engines and metadata storage systems capable of data communication, after the data engine receives an object update request triggered by a user via a client, the data engine can obtain client description information from the object update request so that the client description information can represent the user's relevant information; then the data engine sends the client description information to the metadata storage system, so that after confirming that the metadata storage system has received the client description information, the data engine will continue to send the metadata processing request generated based on the object update request to the metadata storage system, so that... The metadata storage system can process the metadata processing request. After the metadata processing request is completed, the metadata storage system generates an update description message for the target object based on the client description information and the comparison results between the target object before and after the update. This update description message not only indicates what kind of update processing has been performed on the target object, but also indicates who triggered this update processing. This allows downstream objects to obtain more comprehensive information from the update description message, thus enabling the metadata storage system to provide as much information as possible to downstream objects while minimizing the transformation cost.

[0182] Research has revealed that in some application scenarios, the metadata processing request mentioned above (e.g., a request to create multiple partitions) may trigger multiple tasks (e.g., accessing the metadata storage system multiple times by calling the analysis creation interface) so that the metadata processing request corresponds to multiple update processes. Therefore, in order to better improve the message generation effect, this application also provides a possible implementation of the above lake warehouse metadata change determination method. In this implementation, the lake warehouse metadata change determination method may include at least steps 31-34 below.

[0183] Step 31: The metadata storage system receives a metadata processing request sent by the data engine. This metadata processing request is used to request an update to the target object, and it describes multiple update processing tasks.

[0184] In this context, an update processing task represents an update operation performed on the target object. For example, when the metadata processing request mentioned above represents adding N partitions, this metadata processing request can describe the task of adding N partitions, so that the subsequent metadata storage system needs to execute the partition addition logic N times to complete the processing of this metadata processing request. Here, N is a positive integer.

[0185] Step 32: The metadata storage system executes the object update processing logic corresponding to each update processing task.

[0186] In this application, for any update processing task (e.g., adding the nth partition), after the metadata storage system receives the update processing task, the metadata storage system can execute the object update processing logic (e.g., partition addition logic) corresponding to the update processing task, so that the metadata storage system can complete the processing of the update processing task with the help of the object update processing logic. Here, n is a positive integer, n≤N.

[0187] It should be noted that this application does not limit the implementation method of the metadata storage system obtaining multiple update processing tasks. For example, in some application scenarios, after the data engine generates a metadata processing request, if the metadata processing request describes N update processing tasks, the data engine can access the metadata storage system through N interface calls, so that the metadata storage system can complete the update processing tasks involved in each access. Therefore, when the data engine accesses the metadata storage system through the nth interface call, the metadata storage system is used to complete the nth update processing task. Here, n is a positive integer, n≤N.

[0188] Step 33: For any update processing task, after determining that the object update processing logic corresponding to the update processing task has been completed, the metadata storage system generates an update description message corresponding to the update processing task based on the comparison result between the target object before the update and the target object after the update.

[0189] In this application, for the nth update processing task, after the metadata storage system obtains the nth update processing task, the metadata storage system executes the object update processing logic corresponding to the nth update processing task. After the metadata storage system determines that the object update processing logic corresponding to the nth update processing task has been completed, the metadata storage system generates an update description message corresponding to the nth update processing task based on the comparison result between the target object before the update and the target object after the update. This update description message indicates what kind of update processing the metadata storage system performed on the target object by processing the nth update processing task. The target object before the update for the nth update processing task refers to the target object before the metadata storage system processes the nth update processing task. The target object after the update for the nth update processing task refers to the target object obtained after the metadata storage system processes the nth update processing task. The update description message corresponding to the nth update processing task refers to the update description message generated after the metadata storage system processes the nth update processing task. Here, n is a positive integer, n≤N.

[0190] Step 34: The metadata storage system generates an update description message for the target object based on the update description messages corresponding to multiple update processing tasks.

[0191] In this application, for the metadata storage system mentioned above, after the metadata storage system obtains the update description messages corresponding to all update processing tasks (for example, the update description message corresponding to the nth update processing task is ADD_PARTITION, where n is a positive integer and n≤N), the metadata storage system can generate an update description message corresponding to the target object (for example, ALTERTABLE_DROPPARTS) based on the update description messages corresponding to these update processing tasks, so that the update description message corresponding to the target object can indicate what kind of update processing the metadata storage system performed on the target object during the processing of the metadata processing request mentioned above.

[0192] Based on the content of steps 31 to 34 above, in some application scenarios, for the data engine and metadata storage system mentioned above, if the data engine generates a metadata processing request to describe multiple update processing tasks, from the perspective of the data engine, this metadata processing request is a single request. However, from the perspective of the metadata storage system, this metadata processing request consists of multiple task requests. Therefore, the metadata storage system needs to execute multiple processing logics to complete this metadata processing request. Consequently, the metadata storage system generates multiple update description messages during the completion of the metadata processing request. Therefore, these update description messages can be integrated into a single message so that the message and the metadata processing request can form a one-to-one correspondence. This allows the message to more accurately represent what kind of update processing the metadata storage system performed on the target object during the processing of the metadata processing request. This effectively avoids user confusion caused by one metadata processing request corresponding to multiple messages, thereby improving the update detection effect.

[0193] In practice, to better avoid message omissions in some application scenarios, this application also provides a backup mechanism. Specifically, this backup mechanism allows the message receiving object to receive not only the update description message corresponding to the target object generated by the metadata storage system, but also the object update message generated by the data engine in response to the metadata processing request. Since the semantic information carried by the object update message is partially or completely consistent with the semantic information carried by the update description message, the message receiving object can receive two messages describing the update processing of the target object. This better avoids message omissions caused by the metadata storage system's inability to generate or send the update description message, thus ensuring that downstream objects can better perform their business operations. The object update message refers to the message generated by the data engine in response to the metadata processing request, indicating that the target object has been updated. Furthermore, this application does not limit the method of obtaining the object update message; for example, the object update message can be obtained by the data engine using its internally configured Hook logic, so that the object update message can indicate what kind of update processing has been performed on the target object.

[0194] It should be noted that the above update description message and the above object update message have the same format and carry similar semantic information (in fact, the two messages are completely identical). However, the two messages are generated by different entities: the former is generated by the above metadata storage system, and the latter is generated by the above data engine.

[0195] It should also be noted that this application does not limit the implementation method of the message receiving object described above. For example, in one possible implementation, the message receiving object may use multiple message queues (e.g., Figure 3 The implementation (such as message queue 1 and message queue 2 shown) is carried out so that different message queues are used to record messages sent by different devices, so that downstream objects can obtain messages provided by different devices from different message queues respectively.

[0196] Based on the backup mechanism described above, this application also provides an update detection scheme. In this scheme, not only does the metadata storage system itself need to perform update detection processing on the target object to obtain and send the update description message corresponding to the target object (e.g., ...), Figure 3 The metadata update message 2 shown needs to be processed by the data engine using its pre-configured Hook logic to detect and update the target object, so as to obtain and send the object update message corresponding to the target object (e.g., Figure 3 The metadata update message 1 shown is used to enable some downstream objects corresponding to the data engine to obtain these two messages. This can effectively avoid message omission defects caused by the metadata storage system's inability to generate or send the update description message, thereby helping to better ensure that downstream objects can better perform their corresponding business.

[0197] In practice, in some application scenarios, to better save on transformation costs, update detection can be accomplished by adding new modules (such as a Listen interface with update detection function) to the above-mentioned metadata storage system. Based on this, this application also provides a possible implementation of the above-mentioned lake warehouse metadata change determination method. In this implementation, when the metadata storage system has a message generation module embedded in it, the lake warehouse metadata change determination method may include at least steps 51-52 below.

[0198] Step 51: After the object update processing logic is completed, the metadata storage system obtains the logic execution result description information corresponding to the metadata processing request; the logic execution result description information includes some or all of the information carried by the above metadata processing request; the some or all of the information includes the object description information of the target object before the update and the object description information of the target object after the update.

[0199] The description information of the logical execution result corresponding to the metadata processing request is used to indicate the execution status of the object update processing logic corresponding to the metadata processing request.

[0200] Furthermore, this application does not limit the implementation method of the logical execution result description information corresponding to the metadata processing request mentioned above. For example, in some application scenarios, the logical execution result description information may include at least some or all of the information carried by the metadata processing request, so that the logical execution result description information can at least express some semantic information described by the metadata processing request.

[0201] Therefore, in one possible implementation, when the metadata processing request carries at least the object description information of the target object before the update and the object description information of the target object after the update, the logical execution result description information corresponding to the metadata processing request can also include at least the object description information of the target object before the update and the object description information of the target object after the update, so that the logical execution result description information can indicate what the target object was before the update and what the target object is after the update. The relevant content of the object description information of the target object before the update and the object description information of the target object after the update is described above, and will not be repeated here for the sake of brevity.

[0202] Step 52: The metadata storage system sends the above logic execution result description information to the message generation module embedded in the metadata storage system, so that the message generation module can obtain the target object before the update and the target object after the update based on the logic execution result description information, and generate the update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update.

[0203] The message generation module refers to the pre-configured module in the metadata storage system mentioned above, which is used to generate updated description information (e.g., Figure 3 or Figure 4 Module 1 shown); and this application does not limit the implementation of the message generation module. For example, it can be implemented using any existing or future module. Furthermore, in some application scenarios, to minimize modification costs, the message generation module can be implemented using a preset interface (e.g., the Listen interface), so that the metadata storage system only needs to add one interface to embed the message generation module within the metadata storage system.

[0204] Furthermore, this application does not limit the working principle of the above message generation module. For example, it can specifically be that after the message generation module obtains the above logical execution result description information, the message generation module can obtain the target object before the update (e.g., obtain the old table based on dbname and tbl_name) and the target object after the update based on the logical execution result description information, and generate the update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update.

[0205] Based on the relevant content of steps 51 to 52 above, in some application scenarios, for the metadata storage system mentioned above, the purpose of generating the update description message corresponding to the target object within the metadata storage system can be achieved by adding an interface (such as the Listen interface) to the metadata storage system. In this way, the transformation cost can be reduced as much as possible while realizing data update detection within the metadata storage system.

[0206] Based on the lake warehouse metadata change determination method provided in the embodiments of this application, the embodiments of this application also provide a lake warehouse metadata change determination device, which is described below in conjunction with... Figure 5 Explanation and clarification will be provided. Among them, Figure 5 This is a schematic diagram of a lake warehouse metadata change determination device provided in an embodiment of this application. It should be noted that for technical details of the lake warehouse metadata change determination device provided in this embodiment, please refer to the relevant content of the lake warehouse metadata change determination method described above.

[0207] like Figure 5 As shown, the lake warehouse metadata change determination device 500 provided in this application embodiment includes:

[0208] The first receiving unit 501 is used to receive a metadata processing request sent by the data engine, wherein the metadata processing request is used to request an update process for the target object;

[0209] The logic execution unit 502 is used to execute the object update processing logic corresponding to the metadata processing request, and the object update processing logic is used to update the target object;

[0210] The message generation unit 503 is used to generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the object update processing logic is determined to be completed.

[0211] The message sending unit 504 is used to send the update description message to the message receiving object.

[0212] In one possible implementation, the message generation unit 503 includes:

[0213] The first generation subunit is used to generate the update description message based on the type of the target object if the comparison result indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update.

[0214] In one possible implementation, if the target object is a table object, the first generation subunit is specifically used to: if the type of the target object is a table type, generate the update description message based on a first preset string; the semantic information expressed by the first preset string is to update the object identifier of a table object belonging to the table type; if the type of the target object is a view type, generate the update description message based on a second preset string; the semantic information expressed by the second preset string is to update the object identifier of a table object belonging to the view type.

[0215] In one possible implementation, the target object is used to record multiple metadata;

[0216] The message generation unit 503 includes:

[0217] The second generation subunit is used to generate the update description message based on the number of data blocks in the target dimension if the comparison result indicates that the number of data blocks in the target dimension before the update is the same as the number of data blocks in the target dimension after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block.

[0218] In one possible implementation, the second generation subunit is specifically configured to: if the number of data blocks in the at least one data block is 1, generate the update description message according to a third preset string; the semantic information expressed by the third preset string is to update one of the data blocks; if the number of data blocks in the at least one data block is not less than 2, generate the update description message according to a fourth preset string; the semantic information expressed by the fourth preset string is to update multiple of the data blocks.

[0219] In one possible implementation, the target object is used to record multiple metadata;

[0220] The message generation unit 503 includes at least one of a third generation subunit, a fourth generation subunit, and a fifth generation subunit:

[0221] The third generation subunit is configured to generate the update description message based on the fifth preset string if the comparison result indicates that the number of data blocks divided in the target dimension of the updated target object is higher than the number of data blocks divided in the target dimension of the original target object, and the data blocks divided in the target dimension of the updated target object include the data blocks divided in the target dimension of the original target object;

[0222] The fourth generation subunit is configured to generate the update description message based on the sixth preset string if the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is higher than the number of data blocks divided in the target dimension of the updated target object, and the data blocks divided in the target dimension of the target object before the update include the data blocks divided in the target dimension of the updated target object;

[0223] The fifth generation subunit is configured to generate the update description message based on the fourth preset string if the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is different from the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block. The semantic information expressed by the fourth preset string is that multiple data blocks are updated.

[0224] In one possible implementation, the target object is a library object, a table object, or a partition object.

[0225] In one possible implementation, the lake warehouse metadata change determination device 500 further includes:

[0226] The message receiving object is at least one downstream object corresponding to the message middleware or the data engine, and the message middleware is used to provide the update description message to each of the downstream objects.

[0227] In one possible implementation, the message generation unit 503 is specifically used to: generate an update description message corresponding to the target object based on a preset message format and a comparison result between the target object before the update and the target object after the update; the preset message format is the message format used when the data engine sends a message to the message receiving object.

[0228] In one possible implementation, the message receiving object is further configured to receive an object update message generated by the data engine in response to the metadata processing request;

[0229] The semantic information carried by the object update message is partially or entirely consistent with the semantic information carried by the update description message.

[0230] In one possible implementation, the metadata processing request is used to describe multiple update processing tasks;

[0231] The logic execution unit 502 is specifically used to: execute the object update processing logic corresponding to each update processing task;

[0232] The message generation unit 503 is specifically configured to: for any of the update processing tasks, after determining that the object update processing logic corresponding to the update processing task has been completed, generate an update description message corresponding to the update processing task based on the comparison result between the target object before the update and the target object after the update corresponding to the update processing task; and generate an update description message corresponding to the target object based on the update description messages corresponding to the plurality of update processing tasks.

[0233] In one possible implementation, the metadata processing request is generated by the data engine in response to an object update request sent by the client, the object update request carrying client description information;

[0234] The message generation unit 503 is specifically used to: generate an update description message corresponding to the target object based on the client description information provided by the data engine and the comparison result between the target object before the update and the target object after the update.

[0235] In one possible implementation, the message generation unit 503 is specifically used to: generate an update description message corresponding to the target object based on the client description information carried in the metadata processing request and the comparison result between the target object before the update and the target object after the update.

[0236] In one possible implementation, the lake warehouse metadata change determination device 500 further includes:

[0237] The second receiving unit is used to receive client description information sent by the data engine;

[0238] The first receiving unit 501 is specifically used to: receive the metadata processing request corresponding to the client description information sent by the data engine.

[0239] In one possible implementation, the message generation unit 503 is specifically configured to: after determining that the object update processing logic has been completed, obtain logic execution result description information corresponding to the metadata processing request; the logic execution result description information includes some or all of the information carried by the metadata processing request; the some or all of the information includes object description information of the target object before the update and object description information of the target object after the update; send the logic execution result description information to the message generation module embedded in the metadata storage system, so that the message generation module obtains the target object before the update and the target object after the update based on the logic execution result description information, and generates an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update.

[0240] Based on the above-mentioned content regarding the lake warehouse metadata change determination device 500, it can be understood that the lake warehouse metadata change determination device 500 provided in this application embodiment is used to implement a metadata storage system (e.g., similar to Hive). Metastore and other metadata storage systems may contain some or all of the functions of the lake warehouse metadata change determination device 500. The working principle of the lake warehouse metadata change determination device 500 is as follows: After receiving a metadata processing request sent by a data engine to request an update of a target object (e.g., a library object, table object, or partition object), it executes the object update processing logic corresponding to the metadata processing request (e.g., the object update processing logic is used to update the target object). After determining that the object update processing logic has been completed, it generates an update description message corresponding to the target object based on the comparison between the target object before and after the update. This update description message indicates that the target object has been updated, so that it can be sent to the message receiving object (e.g., some downstream businesses corresponding to the data engine). This allows the metadata storage system to generate update description messages itself, thereby enabling the metadata storage system to complete data update detection processing. This effectively overcomes the defects caused by other devices (e.g., data engines) performing data update detection processing (e.g., the difficulty in scaling and maintaining such an update detection scheme due to the need to configure corresponding update detection logic for each data engine).

[0241] In addition, this application also provides an electronic device, the device including a processor and a memory: the memory is used to store instructions or computer programs; the processor is used to execute the instructions or computer programs in the memory, so that the electronic device performs any implementation of the lake warehouse metadata change determination method provided in this application.

[0242] See Figure 6 This diagram illustrates a structural schematic of an electronic device 600 suitable for implementing embodiments of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 6 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0243] like Figure 6 As shown, electronic device 600 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 601, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 602 or a program loaded from storage device 608 into random access memory (RAM) 603. RAM 603 also stores various programs and data required for the operation of electronic device 600. Processing device 601, ROM 602, and RAM 603 are interconnected via bus 604. Input / output (I / O) interface 605 is also connected to bus 604.

[0244] Typically, the following devices can be connected to I / O interface 605: input devices 606 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 607 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 608 including, for example, magnetic tapes, hard disks, etc.; and communication devices 609. Communication device 609 allows electronic device 600 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 6 An electronic device 600 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.

[0245] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 609, or installed from a storage device 608, or installed from a ROM 602. When the computer program is executed by the processing device 601, it performs the functions defined in the methods of embodiments of this disclosure.

[0246] The electronic device provided in this embodiment belongs to the same inventive concept as the method provided in the above embodiments. Technical details not described in detail in this embodiment can be found in the above embodiments, and this embodiment has the same beneficial effects as the above embodiments.

[0247] This application also provides a computer-readable medium storing instructions or a computer program that, when executed on a device, causes the device to perform any implementation of the lake warehouse metadata change determination method provided in this application.

[0248] It should be noted that the computer-readable medium described in this disclosure can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0249] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future-developed networks.

[0250] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0251] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, enable the electronic device to perform the aforementioned methods.

[0252] Computer program code for performing the operations of this disclosure can be written in one or more programming languages ​​or a combination thereof, including but not limited to object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0253] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0254] The units described in the embodiments of this disclosure can be implemented in software or hardware. The names of the units / modules do not necessarily limit the specific unit itself.

[0255] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0256] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0257] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems or apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and relevant parts can be referred to the method section.

[0258] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0259] It should also be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0260] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0261] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining changes in lake warehouse metadata, characterized in that, The method is applied to a metadata storage system, which manages data for lake warehouse metadata sources. The method is executed by the metadata storage system and includes: The system receives a metadata processing request sent by the data engine. The metadata processing request is used to request an update to a target object. The target object includes one or more metadata elements and is located in the metadata source. Execute the object update processing logic corresponding to the metadata processing request, wherein the object update processing logic is used to update the target object; After determining that the object update processing logic has been completed, an update description message corresponding to the target object is generated based on the comparison result between the target object before the update and the target object after the update. The update description message is sent to the message receiving object.

2. The method according to claim 1, characterized in that, The process of generating the update description message includes: If the comparison result indicates that the object identifier of the target object before the update is different from the object identifier of the target object after the update, then the update description message is generated according to the type of the target object.

3. The method according to claim 2, characterized in that, If the target object is a table object, then generating the update description message based on the type of the target object includes: If the target object is of type table, then the update description message is generated based on the first preset string; the semantic information expressed by the first preset string is to update the object identifier of the table object that belongs to the table type. If the target object is of type view, then the update description message is generated based on the second preset string; the semantic information expressed by the second preset string is to update the object identifier of the table object that belongs to the view type.

4. The method according to claim 1, characterized in that, The target object is used to record multiple metadata; The process of generating the update description message includes: If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is the same as the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block, then the update description message is generated based on the number of data blocks in the at least one data block.

5. The method according to claim 4, characterized in that, The step of generating the update description message based on the number of data blocks in the at least one data block includes: If the number of data blocks in the at least one data block is 1, then the update description message is generated according to the third preset string; the semantic information expressed by the third preset string is that one of the data blocks is being updated; If the number of data blocks in the at least one data block is not less than 2, then the update description message is generated according to the fourth preset string; the semantic information expressed by the fourth preset string is to update multiple data blocks.

6. The method according to claim 1, characterized in that, The target object is used to record multiple metadata; The process of generating the update description message includes at least one of the following: If the comparison result indicates that the number of data blocks divided in the target dimension of the updated target object is higher than the number of data blocks divided in the target dimension of the unupdated target object, and the data blocks divided in the target dimension of the updated target object include the data blocks divided in the target dimension of the unupdated target object, then the update description message is generated according to the fifth preset string, and the semantic information expressed by the fifth preset string is to add the data blocks; If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is higher than the number of data blocks divided in the target dimension of the updated target object, and the data blocks divided in the target dimension of the target object before the update include the data blocks divided in the target dimension of the updated target object, then the update description message is generated according to the sixth preset string, and the semantic information expressed by the sixth preset string is to delete the data block; If the comparison result indicates that the number of data blocks divided in the target dimension of the target object before the update is different from the number of data blocks divided in the target dimension of the target object after the update, and there is a difference between the target object before the update and the target object after the update in at least one data block, then the update description message is generated according to the fourth preset string, and the semantic information expressed by the fourth preset string is to update multiple data blocks.

7. The method according to any one of claims 4-6, characterized in that, The target object is a library object, a table object, or a partition object.

8. The method according to claim 1, characterized in that, The message receiving object is at least one downstream object corresponding to the message middleware or the data engine, and the message middleware is used to provide the update description message to each of the downstream objects.

9. The method according to claim 1, characterized in that, The step of generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes: Based on a preset message format and a comparison between the target object before and after the update, an update description message corresponding to the target object is generated; the preset message format is the message format used by the data engine when sending a message to the message receiving object.

10. The method according to claim 1, characterized in that, The message receiving object is also used to receive object update messages generated by the data engine in response to the metadata processing request; The semantic information carried by the object update message is partially or entirely consistent with the semantic information carried by the update description message.

11. The method according to claim 1, characterized in that, The metadata processing request is used to describe multiple update processing tasks; The execution of the object update processing logic corresponding to the metadata processing request includes: Execute the object update processing logic corresponding to each of the aforementioned update processing tasks; The process of generating the update description message corresponding to the target object includes: For any of the update processing tasks, after determining that the object update processing logic corresponding to the update processing task has been completed, an update description message corresponding to the update processing task is generated based on the comparison result between the target object before the update and the target object after the update. Based on the update description messages corresponding to the multiple update processing tasks, an update description message corresponding to the target object is generated.

12. The method according to claim 1, characterized in that, The metadata processing request is generated by the data engine in response to an object update request sent by the client, and the object update request carries client description information; The step of generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes: Based on the client description information provided by the data engine and the comparison results between the target object before and after the update, an update description message corresponding to the target object is generated.

13. The method according to claim 12, characterized in that, The step of generating an update description message corresponding to the target object based on the client description information provided by the data engine and the comparison results between the target object before and after the update includes: Based on the client description information carried in the metadata processing request and the comparison results between the target object before and after the update, an update description message corresponding to the target object is generated.

14. The method according to claim 12, characterized in that, Before receiving the metadata processing request sent by the data engine, the method further includes: Receive client description information sent by the data engine; The metadata processing request sent by the receiving data engine includes: Receive the metadata processing request corresponding to the client description information sent by the data engine.

15. The method according to claim 1, characterized in that, After determining that the object update processing logic has been completed, the step of generating an update description message corresponding to the target object based on the comparison result between the target object before and after the update includes: After determining that the object update processing logic has been completed, obtain the logic execution result description information corresponding to the metadata processing request; the logic execution result description information includes some or all of the information carried by the metadata processing request; the some or all of the information includes the object description information of the target object before the update and the object description information of the target object after the update; The logical execution result description information is sent to the message generation module embedded in the metadata storage system, so that the message generation module can obtain the target object before the update and the target object after the update based on the logical execution result description information, and generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the update.

16. A device for determining changes in lake warehouse metadata, characterized in that, The lake warehouse metadata change determination device operates in a metadata storage system, which is used for data management of lake warehouse metadata sources. The device includes: The first receiving unit is configured to receive a metadata processing request sent by the data engine, the metadata processing request being used to request an update of a target object; the target object includes one or more metadata, and the target object is located in the metadata source; A logic execution unit is used to execute the object update processing logic corresponding to the metadata processing request, wherein the object update processing logic is used to update the target object; The message generation unit is used to generate an update description message corresponding to the target object based on the comparison result between the target object before the update and the target object after the object update processing logic is determined to be completed. The message sending unit is used to send the update description message to the message receiving object.

17. An electronic device, characterized in that, The device includes: a processor and a memory; The memory is used to store instructions or computer programs; The processor is configured to execute the instructions or computer program in the memory to cause the electronic device to perform the method according to any one of claims 1-15.

18. A computer-readable medium, characterized in that, The computer-readable medium stores instructions or computer programs that, when executed on the device, cause the device to perform the method according to any one of claims 1-15.