Data processing method and apparatus
By configuring processing dependencies between data tables and generating processing logs, and using a time-chained table as the initial dependent table, the high resource consumption problem in existing technologies is solved, and low-cost real-time and batch data processing is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- AGRICULTURAL BANK OF CHINA
- Filing Date
- 2022-10-26
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies require two separate systems to perform batch processing and stream processing, resulting in high resource consumption and high implementation costs.
By configuring the processing dependencies between data tables, processing logs are generated, and dependent tables are determined and processed based on the dependencies. A time-chained table is used as the initial dependent table to achieve batch processing and reduce system resource requirements.
This system enables both real-time and batch data processing needs to be met simultaneously, avoiding inconsistencies in data metrics and reducing implementation costs.
Smart Images

Figure CN115576949B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing technology, and more specifically, to a data processing method and apparatus. Background Technology
[0002] In the field of data processing, there are two processing methods: batch processing and stream processing. Batch processing usually refers to collecting and storing data from day T and processing the data from day T in batches on day T+1. Stream processing is the real-time processing of received data.
[0003] To simultaneously implement batch processing and stream processing, two sets of data need to be stored, and two data processing systems need to be deployed so that the two systems can execute batch processing and stream processing respectively. This requires consuming two sets of resources and writing and maintaining two kinds of programs, resulting in high implementation costs. Summary of the Invention
[0004] In view of the above problems, this application is made to provide a data processing method and apparatus to reduce the cost of implementing batch processing and stream processing.
[0005] The specific plan is as follows:
[0006] Firstly, a data processing method is provided, including:
[0007] Configure processing dependencies between different types of data tables;
[0008] Generate a processing log for the dependent table, wherein the processing log includes the data processing start time and data processing end time used to characterize the data processing cycle of this table;
[0009] Based on the processing dependency relationship, determine the target dependency table that depends on the dependent table for data processing;
[0010] Determine whether all dependent tables on the target dependency table have been processed within the data processing cycle of the dependent tables. If all have been processed, process the target dependency table using all dependent tables on the target dependency table. After the target dependency table is processed, transform the target dependency table into a new dependent table and return to the step of generating the dependent table processing log, until the target dependency table cannot be determined based on the processing dependency relationship.
[0011] The dependent table generated at the initial moment is a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing log of the time-chained table are the data effective start time and effective end time of the time-chained table, respectively.
[0012] Optionally, the processing log of the dependent table is a processing log generated by the processing thread of the dependent table after the dependent table is generated.
[0013] The step of determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables includes:
[0014] The following steps are performed using the processing thread of the target dependency table:
[0015] Obtain the data processing cycle from the processing log of the dependent table to get the target data processing cycle;
[0016] Based on the processing dependencies, determine all the dependent tables that the target dependency table depends on;
[0017] Determine whether all the dependent tables have been processed within the target data processing cycle.
[0018] Optionally, processing the target dependency table using all the dependent tables that the target dependency table depends on includes:
[0019] Using the data from all the determined dependent tables within the target data processing cycle, the incremental data of the target dependent table is processed, and the target dependent table is generated based on the incremental data.
[0020] Optionally, determining the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship includes:
[0021] The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing.
[0022] The step of determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables includes:
[0023] When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables.
[0024] Optionally, the time zipper table is generated through the following steps:
[0025] The operation logs in the source system database are obtained through the message middleware, and a time-chain table of the source system is generated based on the operation logs.
[0026] Secondly, a data processing apparatus is provided, comprising:
[0027] The dependency configuration unit is used to configure the processing dependencies between different types of data tables;
[0028] A processing log generation unit is used to generate processing logs for the dependent table. The processing logs include the data processing start time and data processing end time, which characterize the data processing cycle of the table. The dependent table generated at the initial moment is a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing logs of the time-chained table are the data activation start time and activation end time of the time-chained table, respectively.
[0029] The dependency table processing start unit is used to determine the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship, and to terminate the data processing if the target dependency table cannot be determined based on the processing dependency relationship.
[0030] The dependency table processing unit is used to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables. If all have been processed, the target dependency table is processed using all dependent tables on which the target dependency table depends. After the target dependency table is processed, the target dependency table is transformed into a new dependent table, and the processing log generation unit is instructed to execute the step of generating processing logs for the dependent tables, until the target dependency table cannot be determined based on the processing dependency relationship.
[0031] Optionally, the processing log of the dependent table is a processing log generated by the processing thread of the dependent table after the dependent table is generated.
[0032] The process by which the dependency table processing unit determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables includes:
[0033] The following steps are performed using the processing thread of the target dependency table:
[0034] Obtain the data processing cycle from the processing log of the dependent table to get the target data processing cycle;
[0035] Based on the processing dependencies, determine all the dependent tables that the target dependency table depends on;
[0036] Determine whether all the dependent tables have been processed within the target data processing cycle.
[0037] Optionally, the process by which the dependency table processing unit processes the target dependency table using all the dependent tables that the target dependency table depends on includes:
[0038] The incremental data of the target dependent table is processed using the data of all the determined dependent tables within the target data processing cycle, and the target dependent table is generated based on the incremental data.
[0039] Optionally, the process of the dependency table processing initiation unit determining the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship includes:
[0040] The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing.
[0041] The process by which the dependency table processing unit determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables includes:
[0042] When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent table.
[0043] Optionally, the device further includes a time-chain table generation unit, used to obtain operation logs from the source system database through a message middleware, and generate a time-chain table for the source system based on the operation logs.
[0044] Using the above technical solution, this application pre-configures the processing dependencies between data tables. The source system data pre-generates the initially dependent tables. During this round of data processing, each time a dependent table is generated, a processing log for that dependent table is generated, and a data processing process for the target dependent table is initiated. The target dependent table is a table that depends on the dependent tables for data processing, determined according to the preset processing dependencies. The data processing process for the target dependent table includes determining whether all dependent tables required for processing the target dependent table have been processed within the data processing cycle of the dependent tables. If all have been processed, then all dependent tables required for processing are used to process the data, resulting in the processed target dependent table. The target dependent table is then transformed into a new dependent table, and a processing log for the new dependent table is generated. Each dependent table that depends on the new dependent table for data processing initiates its own data processing process until the target dependent table cannot be determined according to the processing dependencies, at which point the current round of data processing ends, achieving batch data processing. Furthermore, this solution uses a time-chained table generated from the source system data as the initial dependent table. This means that the current data processing round uses this time-chained table as the lowest-level dependent table. Under these conditions, even if a dependent table relies on a dependent table other than the time-chained table for data processing, the data it uses is still the processing result of the data in the time-chained table. For a time-chained table with a defined data processing cycle, this solution can perform batch data processing based on the time-chained table, generating a series of dependent tables. Therefore, by configuring a data processing cycle of less than one day, or even a minute-level cycle, for the time-chained table, the real-time performance of data processing can be improved. This allows a single system to meet both real-time and batch data processing needs, avoiding inconsistencies in data metrics caused by separate data stream and batch processing by two different systems, thus reducing the cost of implementing stream and batch processing. Attached Figure Description
[0045] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:
[0046] Figure 1 A flowchart illustrating a data processing method provided in an embodiment of this application;
[0047] Figure 2 This is a schematic diagram of a data processing device provided in an embodiment of this application. Detailed Implementation
[0048] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0049] This application provides a data processing method and apparatus that can reduce the cost of implementing batch processing and stream processing.
[0050] Figure 1 This is a schematic flowchart illustrating a data processing method according to an embodiment of this application. (In conjunction with...) Figure 1 As shown, the method may include the following steps:
[0051] Step S101: Configure the processing dependencies between different types of data tables.
[0052] For example, a set of processing dependencies may include a dependent table A and a dependent table B. When processing data on the dependent table B, data from the dependent table A is required, and data from other dependent tables besides A may also be used. It should be noted that the table to be processed is called the dependent table, and the table used to provide data for processing other tables is called the dependent table.
[0053] Step S102: Generate the processing log of the dependent table.
[0054] The processing log may include the start and end times of data processing, representing the data processing cycle of this table. It should be noted that recording the start and end times in the processing log can be used to determine whether the data in the dependent table is valid. Specifically, the processing log of the dependent table can help dependent tables that rely on it for data processing determine whether the dependent table has been processed. Furthermore, the processing log of the dependent table can be a log generated after the table is generated, indicating that the table has been processed and that other dependent tables that rely on it for data processing can start their respective processing threads.
[0055] For example, the processing log may consist of the table name, the start time and end time of data processing for the table, and the data processing cycle for the table, wherein the data processing cycle may be the time spent processing the data of the dependent table.
[0056] It should be noted that the dependent table can be the dependent table at the initial moment, or it can be a dependent table transformed from a dependent table obtained by data processing based on the dependent table. In the case of the dependent table generated at the initial moment, it can be a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing log of the time-chained table are the data effective start time and effective end time of the time-chained table, respectively.
[0057] Step S103: Based on the processing dependency relationship, determine whether there is a target dependency table that depends on the dependent table for data processing. If yes, proceed to step S104; otherwise, proceed to step S107.
[0058] It should be noted that there may be multiple target dependent tables that rely on the dependent table for data processing. If there are multiple definite target dependent tables, the subsequent steps are performed for each target dependent table.
[0059] Step S104: Determine whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables. If so, proceed to step S105.
[0060] It should be noted that if the judgment result of step S104 is negative, step S104 can be waited for and repeated until the judgment result is positive, or the process can be terminated directly. After any other dependent table on which the target dependent table depends is processed and the processing log of the other dependent table is generated, step S103 and subsequent steps can be re-triggered. The other dependent table is a table other than the dependent table mentioned in step S102.
[0061] In other words, generating the processing log of any dependent table that the target dependent table depends on is the trigger condition for starting the processing of the target dependent table, and all dependent tables that the target dependent table depends on have been processed within the data processing cycle of the dependent tables, which is the trigger condition for starting the data processing of the target dependent table.
[0062] Step S105: Process the target dependency table using all the dependent tables that the target dependency table depends on.
[0063] Step S106: After the target dependency table is processed, the target dependency table is transformed into a new dependent table, and the process returns to step S102 to generate the processing log of the dependent table.
[0064] It should be noted that after returning to step S102, the generated processing log is the new dependent table generated in step S106.
[0065] Step S107: End the loop and complete the data processing.
[0066] It should be noted that if the target dependent table that depends on the dependent table for data processing cannot be determined based on the processing dependency relationship, it can be indicated that all data tables have been processed in this round of data processing, and this round of data processing has been completed.
[0067] The data processing methods described above can be applied to data warehouses. For example, the data warehouse can be built on a massively parallel processing (MPP) architecture or a distributed system infrastructure like Hadoop. This application pre-configures the processing dependencies between data tables. The source system data pre-generates the initially dependent tables. During this round of data processing, each time a dependent table is generated, a processing log for that dependent table is generated, and a data processing process for the target dependent table is initiated. The target dependent table is a table that depends on the dependent tables for data processing, determined according to the preset processing dependencies. The data processing process for the target dependent table includes determining whether all dependent tables required for processing the target dependent table have been processed within the data processing cycle of the dependent tables. If all have been processed, the target dependent table is processed using all required dependent tables to obtain the processed target dependent table. The target dependent table is then transformed into a new dependent table, and a processing log for the new dependent table is generated. Each dependent table that depends on the new dependent table then initiates its own data processing process until the target dependent table cannot be determined based on the processing dependencies, at which point the current round of data processing ends, achieving batch data processing. Furthermore, this solution uses a time-chained table generated from the source system data as the initial dependent table. This means that the current data processing round uses this time-chained table as the lowest-level dependent table. Under these conditions, even if a dependent table relies on a dependent table other than the time-chained table for data processing, the data it uses is still the processing result of the data in the time-chained table. For a time-chained table with a defined data processing cycle, this solution can perform batch data processing based on the time-chained table, generating a series of dependent tables. Therefore, by configuring a data processing cycle of less than one day, or even a minute-level cycle, for the time-chained table, the real-time performance of data processing can be improved. This allows a single system to meet both real-time and batch data processing needs, avoiding inconsistencies in data metrics caused by separate data stream and batch processing by two different systems, thus reducing the cost of implementing stream and batch processing.
[0068] In some embodiments provided in this application, the time zipper table may be generated through the following steps:
[0069] Obtain operation logs from the source system database using a message broker;
[0070] Generate a time-chain table of the source system based on the operation log.
[0071] Specifically, the operation log (binlog) of the source system database can be the operation log of the data warehouse that applies this application solution, which is collected, parsed and sent by the message middleware; in addition, this application solution can generate a time-chained table based on the operation log of the source system without generating a source table, thereby improving the efficiency of data processing.
[0072] In some embodiments provided in this application, the processing log of the dependent table may be a processing log generated by the processing thread of the dependent table after the dependent table is generated.
[0073] Based on the above, determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables may include performing the following steps using the processing thread of the target dependent table:
[0074] Step S01: Obtain the data processing cycle from the processing log of the dependent table to obtain the target data processing cycle.
[0075] Step S02: Based on the processing dependency relationship, determine all the dependent tables that the target dependency table depends on.
[0076] Step S03: Determine whether all the determined dependent tables have been processed within the target data processing cycle.
[0077] It should be noted that each dependent table has its own processing thread, which allows different dependent tables to process data simultaneously, and different processing threads to work at the same time; in addition, for the dependent table transformed from the dependent table, the processing thread of the dependent table is the processing thread of the dependent table before the transformation.
[0078] In some embodiments provided in this application, processing the target dependency table using all the dependent tables that the target dependency table depends on may include:
[0079] Step S11: Using the data from all the determined dependent tables within the target data processing cycle, process the incremental data of the target dependent tables.
[0080] It should be noted that, for dependent tables that are not at the initial time, the data of all the determined dependent tables within the target data processing cycle can also be incremental data.
[0081] Step S12: Generate the target dependency table based on the incremental data.
[0082] Specifically, step S12 may include performing data update or insertion operations based on the incremental data to generate the target dependency table.
[0083] The data processing method described above can perform data processing based on incremental data without acquiring and processing the full dataset. Therefore, the data processing method provided in this application reduces the time required for data processing and improves the performance of data processing.
[0084] In some embodiments provided in this application, determining the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship may include:
[0085] The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing.
[0086] It should be noted that there may be no table in the processing dependency relationship that needs to rely on the dependent table for data processing, that is, the target dependent table cannot be determined. In other words, if all data tables have been processed in this round of data processing, then the loop ends and this round of data processing is completed.
[0087] Based on the above, determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables may include:
[0088] When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent table.
[0089] Specifically, the processing thread of the target dependency table can subscribe to messages of all dependent tables that the target dependency table depends on through the message middleware. The messages can be the processing logs. Through the message middleware, after the dependent table is generated or processed, the target dependency table of the dependent table can be actively notified through the processing logs of the dependent table, so that each dependency table can perform its own data processing.
[0090] The data processing apparatus provided in the embodiments of this application is described below. The data processing apparatus described below and the data processing method described above can be referred to in correspondence.
[0091] See Figure 2 , Figure 2 This is a schematic diagram of the structure of a data processing device disclosed in an embodiment of this application.
[0092] like Figure 2 As shown, the device may include:
[0093] Dependency configuration unit 11 is used to configure the processing dependencies between different types of data tables;
[0094] The processing log generation unit 12 is used to generate processing logs for the dependent table. The processing logs include the data processing start time and data processing end time, which are used to characterize the data processing cycle of the table. The dependent table generated at the initial moment is a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing logs of the time-chained table are the data effective start time and effective end time of the time-chained table, respectively.
[0095] The dependency table processing start unit 13 is used to determine the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship, and to end the data processing if the target dependency table cannot be determined based on the processing dependency relationship.
[0096] The dependency table processing unit 14 is used to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables. If all have been processed, the target dependency table is processed using all dependent tables on which the target dependency table depends. After the target dependency table is processed, the target dependency table is transformed into a new dependent table, and the processing log generation unit is instructed to execute the step of generating processing logs for the dependent tables until the target dependency table cannot be determined based on the processing dependency relationship.
[0097] In some embodiments provided in this application, the processing log of the dependent table may be a processing log generated by the processing thread of the dependent table after the dependent table is generated.
[0098] Based on the above, the process by which the dependency table processing unit 14 determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables may include:
[0099] The following steps are performed using the processing thread of the target dependency table:
[0100] Obtain the data processing cycle from the processing log of the dependent table to get the target data processing cycle;
[0101] Based on the processing dependencies, determine all the dependent tables that the target dependency table depends on;
[0102] Determine whether all the dependent tables have been processed within the target data processing cycle.
[0103] In some embodiments provided in this application, the process by which the dependency table processing unit 14 processes the target dependency table using all the dependent tables that the target dependency table depends on may include:
[0104] The incremental data of the target dependent table is processed using the data of all the determined dependent tables within the target data processing cycle, and the target dependent table is generated based on the incremental data.
[0105] In some embodiments provided in this application, the process by which the dependency table processing initiation unit 13 determines the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship may include:
[0106] The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing.
[0107] Based on the above, the process by which the dependency table processing unit 14 determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables may include:
[0108] When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent table.
[0109] In some embodiments provided in this application, the device may further include a time-chain table generation unit, used to obtain operation logs from the source system database through a message middleware, and generate a time-chain table for the source system based on the operation logs.
[0110] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0111] The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments. The various embodiments can be combined as needed, and the same or similar parts can be referred to each other.
[0112] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A data processing method, characterized in that, include: Configure processing dependencies between different types of data tables; Generate a processing log for the dependent table, wherein the processing log includes the data processing start time and data processing end time used to characterize the data processing cycle of this table; Based on the processing dependency relationship, determine the target dependency table that depends on the dependent table for data processing; Determine whether all dependent tables on the target dependency table have been processed within the data processing cycle of the dependent tables. If all have been processed, process the target dependency table using all dependent tables on the target dependency table. After the target dependency table is processed, transform the target dependency table into a new dependent table and return to the step of generating the dependent table processing log, until the target dependency table cannot be determined based on the processing dependency relationship. The dependent table generated at the initial moment is a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing log of the time-chained table are the data effective start time and effective end time of the time-chained table, respectively.
2. The method according to claim 1, characterized in that, The processing log of the dependent table is generated by the processing thread of the dependent table after the dependent table is generated. The step of determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables includes: The following steps are performed using the processing thread of the target dependency table: Obtain the data processing cycle from the processing log of the dependent table to get the target data processing cycle; Based on the processing dependencies, determine all the dependent tables that the target dependency table depends on; Determine whether all the dependent tables have been processed within the target data processing cycle.
3. The method according to claim 2, characterized in that, The step of processing the target dependency table using all the dependent tables that the target dependency table depends on includes: Using the data from all the determined dependent tables within the target data processing cycle, the incremental data of the target dependent table is processed, and the target dependent table is generated based on the incremental data.
4. The method according to any one of claims 1-3, characterized in that, The step of determining the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship includes: The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing. The step of determining whether all dependent tables on which the target dependent table depends have been processed within the data processing cycle of the dependent tables includes: When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables.
5. The method according to any one of claims 1-3, characterized in that, The time zipper table is generated through the following steps: The operation logs in the source system database are obtained through the message middleware, and a time-chain table of the source system is generated based on the operation logs.
6. A data processing apparatus, characterized in that, include: The dependency configuration unit is used to configure the processing dependencies between different types of data tables; A processing log generation unit is used to generate processing logs for the dependent table. The processing logs include the data processing start time and data processing end time, which characterize the data processing cycle of the table. The dependent table generated at the initial moment is a time-chained table generated from the source system data. The data processing start time and data processing end time in the processing logs of the time-chained table are the data activation start time and activation end time of the time-chained table, respectively. The dependency table processing start unit is used to determine the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship, and to terminate the data processing if the target dependency table cannot be determined based on the processing dependency relationship. The dependency table processing unit is used to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables. If all have been processed, the target dependency table is processed using all dependent tables on which the target dependency table depends. After the target dependency table is processed, the target dependency table is transformed into a new dependent table, and the processing log generation unit is instructed to execute the step of generating processing logs for the dependent tables, until the target dependency table cannot be determined based on the processing dependency relationship.
7. The apparatus according to claim 6, characterized in that, The processing log of the dependent table is generated by the processing thread of the dependent table after the dependent table is generated. The process by which the dependency table processing unit determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables includes: The following steps are performed using the processing thread of the target dependency table: Obtain the data processing cycle from the processing log of the dependent table to get the target data processing cycle; Based on the processing dependencies, determine all the dependent tables that the target dependency table depends on; Determine whether all the dependent tables have been processed within the target data processing cycle.
8. The apparatus according to claim 7, characterized in that, The process by which the dependency table processing unit processes the target dependency table using all the dependent tables that the target dependency table depends on includes: The incremental data of the target dependent table is processed using the data of all the determined dependent tables within the target data processing cycle, and the target dependent table is generated based on the incremental data.
9. The apparatus according to any one of claims 6-8, characterized in that, The process of the dependency table processing initiation unit determining the target dependency table that depends on the dependent table for data processing based on the processing dependency relationship includes: The processing thread of the dependent table sends the processing log of the dependent table to the message middleware so that the processing thread of the target dependent table can obtain it. The target dependent table is the table in the processing dependency relationship that needs to rely on the dependent table for data processing. The process by which the dependency table processing unit determines whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables includes: When the processing thread of the target dependency table obtains the processing log of the dependent table, it uses the processing thread of the target dependency table to determine whether all dependent tables on which the target dependency table depends have been processed within the data processing cycle of the dependent tables.
10. The apparatus according to any one of claims 6-8, characterized in that, The device also includes a time-chain table generation unit, which is used to obtain operation logs from the source system database through message middleware and generate a time-chain table for the source system based on the operation logs.