A method and apparatus for comparing database data in packets

By constructing a ROWID mapping table and performing grouped comparisons during the data migration process, the problem of frequent memory and disk swapping in massive data comparisons was solved, thus improving comparison performance.

CN117762996BActive Publication Date: 2026-06-19WUHAN DAMENG DATABASE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WUHAN DAMENG DATABASE
Filing Date
2023-12-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

During the comparison of massive amounts of data, the frequent memory and disk swapping caused by the dispersed nature of MD5 values ​​reduces the comparison performance.

Method used

During the data migration process, ROWID is added as a query item to build a ROWID mapping table. After the data migration is completed, grouping and comparison are performed based on the mapping relationship to reduce memory resource requirements and avoid data page and disk swapping.

Benefits of technology

It improves data comparison performance, reduces memory resource requirements and disk swapping frequency, and ensures efficient data comparison.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117762996B_ABST
    Figure CN117762996B_ABST
Patent Text Reader

Abstract

This invention relates to a method and apparatus for grouping and comparing database data. The method mainly includes: constructing a ROWID mapping table for a left and right table; calculating the number of groups based on the number of rows in the left and right tables and the available memory of the current system; obtaining the ROWID mapping relationship for each group from the ROWID mapping table based on the number of rows in the left table and the number of groups, and constructing data query conditions for each group based on the ROWID mapping relationship; extracting a preset number of group data in parallel for comparison according to the group data query conditions; after the data comparison of a certain group is completed, extracting the data of the next group sequentially for comparison, until all group comparisons are completed, and generating a comparison report based on the data remaining in memory. This invention can reduce the memory resource requirements during the comparison process, reduce or avoid the number of data page and disk swaps due to insufficient memory, thereby improving comparison performance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of database technology, and in particular to a method and apparatus for grouping and comparing database data. Background Technology

[0002] With the development of database technology, data migration technology between databases has been applied in various industries. After data migration between two databases is completed, in order to verify the correctness of the data migration, the tables in the two databases need to be compared to ensure the consistency of the data. Alternatively, if a failure occurs during the migration process, it may be necessary to identify data differences. Therefore, an efficient and fast data comparison solution is needed. Currently, data migration and data comparison between two databases are two completely independent processes. After the data migration is completed, a data comparison tool is used to compare the data in the two tables (left and right tables). To improve the performance of data comparison, the two tables are generally divided into N independent groups. Then, each database uses G connections to concurrently query and retrieve data. For each retrieved data, an MD5 value is calculated. Then, algorithms such as HASH or B-tree are used to offset the MD5 values ​​that are the same on both sides. Finally, the remaining MD5 values ​​that cannot be offset are used to generate a comparison report with the corresponding records, thus completing the comparison. Current methods offer good comparison performance for smaller tables when memory is ample. However, when two tables contain massive amounts of data, and there isn't enough memory to accommodate all the comparison records, the dispersed nature of MD5 values ​​means that adjacent records in the database will be stored in different data pages of a hash or B-tree. This leads to frequent data swapping between memory and disk during comparison, thus reducing performance. Therefore, improving the efficiency of comparing massive amounts of data in this scenario has become a crucial technical problem that the industry urgently needs to solve.

[0003] Therefore, how to overcome the shortcomings of existing technologies and solve the technical problems of massive data comparison performance is a problem to be solved in this technical field. Summary of the Invention

[0004] To address the shortcomings or improvement needs of existing technologies: When two tables contain massive amounts of data and there is insufficient memory to accommodate all comparison records, due to the dispersed nature of MD5 values, adjacent records in the database will be recorded in different data pages of a hash or B-tree. This leads to frequent data swapping between memory and disk during comparison, thus reducing comparison performance. To solve the aforementioned shortcomings encountered in comparing massive amounts of data, this invention provides a method and device for grouping and comparing database data. It integrates the data migration and data comparison processes. When migrating data, ROWID needs to be appended as a query item in the source database. Furthermore, at regular intervals, the target database ROWID of that row needs to be obtained and the mapping relationship saved. After the data migration is complete, several ROWID mapping relationships between the source and target ends will be obtained. When comparing data, it is no longer necessary to query the data scale from the source and target ends. Instead, the comparison can be performed on demand based on the mapping relationships saved during the previous data migration. This grouping based on mapping relationships can ensure that the data in each group of the left and right tables is eliminated as much as possible, reduce the memory resource requirements during the comparison process, and reduce or avoid the number of data page and disk swaps due to insufficient memory, thereby improving the comparison performance.

[0005] The embodiments of the present invention adopt the following technical solutions:

[0006] In a first aspect, the present invention provides a method for comparing database data in groups, comprising:

[0007] Construct a mapping table between the ROWID values ​​of the left and right tables;

[0008] The number of groups is calculated using the number of rows in the left and right tables and the amount of memory currently available in the system.

[0009] The ROWID mapping relationship for each group is obtained from the ROWID mapping relationship table by the number of rows and the number of groups in the left table, and the data query conditions for each group are constructed based on the ROWID mapping relationship.

[0010] Based on the data query criteria for grouping, a preset number of grouped data are extracted in parallel for comparison;

[0011] Once the data comparison for a certain group is completed, the data for the next group is extracted sequentially for comparison, until all groups are compared. A comparison report is then generated based on the data remaining in memory.

[0012] Furthermore, the construction of the ROWID mapping relationship table between the left and right tables specifically includes:

[0013] The source database data is retrieved by using ROWID as an additional query item, and the source database data carrying ROWID is sent to the target database for data migration; the table in the source database is the left table, and the table in the target database is the right table.

[0014] For every X rows of data, a ROWID mapping relationship is constructed using the left table row number currently extracted from the left table, the left table ROWID corresponding to that row number, and the right table ROWID returned by the target database for that row of data.

[0015] After the data migration is completed, K ROWID mapping relationships between left and right tables are formed, that is, the ROWID mapping relationship table between left and right tables is constructed.

[0016] Furthermore, during data migration, the number of rows migrated out of the left table and the number of rows successfully imported into the right table are counted. The number of rows migrated out of the left table is the same as the number of rows in the left table, and the number of rows successfully imported into the right table is the same as the number of rows in the right table.

[0017] Furthermore, the calculation of the number of groups using the number of rows in the left and right tables and the available memory size of the current system specifically includes:

[0018] The number of groups N = ((L+R)*Z) / ((M-(LR)*Z) / G); where N is the number of groups rounded up, L is the number of rows in the left table, R is the number of rows in the right table, M is the memory size used for comparison, G is the preset number of groups that can be extracted in parallel at the same time, and Z is the memory size occupied by a single row of data in the comparison algorithm.

[0019] Furthermore, the comparison is terminated when the value of (LR)*Z is close to, equal to, or exceeds the value of M.

[0020] Furthermore, the step of obtaining the ROWID mapping relationship for each group from the ROWID mapping relationship table using the number of rows and groups in the left table, and constructing the data query conditions for each group based on the ROWID mapping relationship specifically includes:

[0021] The number of rows L in the left table is divided into N groups based on the number of groups N;

[0022] Get the starting row number of each group;

[0023] By finding the closest left table row number in the ROWID mapping table using the starting row number, the ROWID mapping relationship corresponding to each group can be obtained.

[0024] Each group combines its own ROWID mapping relationship with the ROWID mapping relationships of the adjacent groups above and below to construct the left table conditions and right table conditions for the data query conditions of each group.

[0025] Furthermore, each group, combining its own ROWID mapping relationship and the ROWID mapping relationships of its adjacent groups, constructs the left and right table conditions for the data query conditions of each group, specifically including:

[0026] When a group has only downward adjacent groups, the left table condition for the data query of that group is: ROWID is less than or equal to the left table ROWID in the ROWID mapping relationship of its own group; the right table condition for the data query of that group is: ROWID is less than or equal to the right table ROWID in the ROWID mapping relationship of its own group.

[0027] When a group has an upward adjacent group and a downward adjacent group, the left table condition for the data query of this group is: ROWID is greater than the left table ROWID in the ROWID mapping relationship of the upward adjacent group, and ROWID is less than or equal to the left table ROWID in the ROWID mapping relationship of its own group; the right table condition for the data query of this group is: ROWID is greater than the right table ROWID in the ROWID mapping relationship of the upward adjacent group, and ROWID is less than or equal to the right table ROWID in the ROWID mapping relationship of its own group.

[0028] When a group has only an upwardly adjacent group, the left table condition for the data query of that group is: ROWID is greater than the left table ROWID in the ROWID mapping relationship of the upwardly adjacent group; the right table condition for the data query of that group is: ROWID is greater than the right table ROWID in the ROWID mapping relationship of the upwardly adjacent group.

[0029] Furthermore, the step of extracting a preset number of grouped data in parallel for comparison according to the grouped data query conditions specifically includes:

[0030] Based on the left and right table conditions of the grouped data query conditions, G group connections are created simultaneously. Data from the left and right tables of the G groups are extracted and compared in the order of grouping. Collisions of identical data are eliminated, while different data is retained in memory. Here, G is the preset number of groups that can be extracted in parallel at the same time.

[0031] Furthermore, for the created G-group connections, each group of connections includes two connections, which connect to the source database and the target database respectively to extract data. When both connections have completed data extraction, the group is marked as comparison complete.

[0032] On the other hand, the present invention provides an apparatus for comparing grouped database data, specifically comprising at least one processor and a memory, wherein the at least one processor and the memory are connected via a data bus, the memory stores instructions executable by the at least one processor, and the instructions, after being executed by the processor, are used to complete the method for comparing grouped database data in the first aspect.

[0033] Compared with the prior art, the beneficial effects of the present invention are as follows: The present invention provides a method and device for grouping and comparing database data, which performs grouped comparison on massive amounts of data after migration between databases, in order to verify the correctness of data migration or find data differences caused by failures during the migration process.

[0034] In traditional solutions, data migration and data comparison are two independent processes. This leads to a lack of grouping decision-making conditions when using grouped comparison optimization techniques to improve comparison performance. The left and right tables are independently divided into N groups based on their estimated data volume, and there is no one-to-one mapping relationship between the data in the N groups of the two tables. For example, some data in the first group of the left table may not appear in the first group of the right table. After the data extraction and comparison of the first groups of both sides are completed, a large amount of data from the first group of the left table remains in memory. This data needs to be compared and eliminated by querying the second or third group of the right table. This phenomenon results in a large amount of inconsistent data being generated during the comparison process, occupying limited memory resources. Subsequent comparison processes may run out of memory, requiring data pages to be swapped with disk to free up memory for continued comparison, thus reducing the performance of data comparison.

[0035] This invention addresses the aforementioned problems by integrating data migration and data comparison processes. During data migration, ROWID is appended as a query term in the source database. Furthermore, every X rows, the target database's ROWID for that row is retrieved and the mapping is saved. After data migration is complete, K source and target ROWID mappings are obtained. Data comparison no longer requires querying the data size from the source and target databases; comparisons are performed directly based on the mappings saved during migration, grouping data as needed. This mapping-based grouping ensures maximum collision elimination within each group of the left and right tables, reducing memory resource requirements during comparison and minimizing or avoiding data page and disk swapping due to insufficient memory, thus improving comparison performance. Attached Figure Description

[0036] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments of the present invention will be briefly described below. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without any creative effort.

[0037] Figure 1 This is a flowchart of a method for grouping and comparing database data provided in Embodiment 1 of the present invention;

[0038] Figure 2 This is an extended flowchart of step 100 provided in Embodiment 1 of the present invention;

[0039] Figure 3 This is a detailed flowchart of step 300 provided in Embodiment 1 of the present invention;

[0040] Figure 4 This is a schematic diagram of a device structure for grouping and comparing database data, provided in Embodiment 3 of the present invention. Detailed Implementation

[0041] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention. These all fall within the scope of protection of the present invention.

[0042] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0043] It should be noted that, unless otherwise specified, the various features in the embodiments of this invention can be combined with each other, all of which are within the protection scope of this application. Furthermore, although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed differently from the module division in the device or the order shown in the flowchart.

[0044] Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. Furthermore, the technical features involved in the various embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.

[0045] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.

[0046] Example 1:

[0047] To address data comparison solutions after massive amounts of data have been migrated between databases, embodiments of this invention provide a method for grouping and comparing database data, such as... Figure 1 As shown, the specific steps are as follows.

[0048] Step 100: Construct the ROWID mapping table between the left and right tables. This step uses ROWID as an additional query item in the source database to retrieve data and send it to the target database. The target database executes the query sequentially according to the data retrieval order, realizing the data migration from the source database to the target database. During this process, every X rows, the target database ROWID of that row is retrieved and the mapping relationship is saved. After the data migration is complete, K source and target ROWID mapping relationships will be obtained.

[0049] Step 200: Calculate the number of groups based on the number of rows in the left and right tables and the available memory in the current system. In this step, to prevent the limited memory from being exhausted and causing performance degradation during the comparison, and to consider using grouped queries to improve the data retrieval rate from the database, the number of groups N needs to be calculated before the comparison.

[0050] Step 300: Obtain the ROWID mapping relationship for each group from the ROWID mapping relationship table using the number of rows and groups in the left table, and construct the data query conditions for each group based on the ROWID mapping relationship. In this step, the left table is split into N groups using the number of rows L in the left table, and the starting row number of each group is obtained. The nearest left table row number is found in the ROWID mapping relationship table using the starting row number, thereby obtaining the ROWID mapping relationship corresponding to each group. In addition, each group can be combined with its adjacent groups above and below to construct the conditions for the group query range. Groups without adjacent groups above and below are not constructed with closed intervals, and only an open interval is used.

[0051] Step 400: Based on the data query conditions for grouping, extract a preset number of grouped data in parallel for comparison. During this step, identical data collisions are eliminated, while differences are retained in memory.

[0052] Step 500: After the data comparison of a certain group is completed, extract the data of the next group in sequence for comparison, until all groups are compared. Generate a comparison report based on the data remaining in memory. This step completes the data comparison after all groups are compared, and the final comparison report generated based on the data remaining in memory is the required comparison report.

[0053] This embodiment integrates the data migration and data comparison processes using the above method. During data migration, ROWID is appended as a query term in the source database. Furthermore, every X rows, the target database's ROWID for that row is retrieved and the mapping is saved. After data migration is complete, K source and target ROWID mappings are obtained. During data comparison, it's no longer necessary to query the data size from the source and target databases; comparisons are performed directly based on the mappings saved during the migration, grouping data as needed. This mapping-based grouping ensures maximum collision elimination within each group of the left and right tables, reducing memory resource requirements during comparison and minimizing or avoiding data page and disk swapping due to insufficient memory, thus improving comparison performance.

[0054] refer to Figure 2 As shown, in a specific implementation, the construction of the ROWID mapping relationship table between the left and right tables in step 100 specifically includes the following steps.

[0055] Step 101: Use ROWID as an additional query term to retrieve data from the source database, and send the source database data carrying ROWID to the target database for data migration; where the table in the source database is the left table and the table in the target database is the right table.

[0056] Step 102: For every X rows of data, construct a ROWID mapping relationship using the left table row number currently extracted from the left table, the left table ROWID corresponding to that row number, and the right table ROWID returned by the target database for that row of data.

[0057] Step 103: After the data migration is completed, K ROWID mapping relationships between the left and right tables are formed, that is, the ROWID mapping relationship table between the left and right tables is constructed.

[0058] In the above process, assuming the source database table is the left table and the target database table is the right table, the target database needs to count the number of rows of migrated data during execution. Every X rows, a ROWID mapping relationship is constructed using the currently extracted row number from the left table, the ROWID of that row, and the ROWID of that row returned in the right table from the target database. After the data migration is complete, K ROWID mapping relationships between the left and right tables will be formed. An example of a ROWID mapping relationship table is shown below:

[0059] Left table row number Left table ROWID Right table ROWID N1 L_RID1 R_RID1 N2 L_RID2 R_RID2 N3 L_RID3 R_RID3

[0060] During the data migration process, various errors may occur when the target database is being migrated (data truncation, type conversion failure, target database failure, etc.), causing the data in the left table to fail to be migrated to the right table. Therefore, during the migration, it is necessary to count the number of rows migrated out of the left table and the number of rows successfully migrated into the right table. The number of rows migrated out of the left table is the number of rows in the left table L, and the number of rows successfully migrated into the right table is the number of rows in the right table R.

[0061] The number of rows X at intervals needs to be set in conjunction with the available memory size M. It is necessary to ensure that memory M can accommodate twice the number of X rows of records (left and right tables). Since the ROWID of the data to be imported into the target database needs to be obtained, too close an interval will affect the performance of data migration. By default, X is set to one-thousandth of the number of rows that M can accommodate. It should be noted that this example is for comparison of massive data. One-thousandth of massive data is not a small number. X can be a dynamic value, and this value should be evaluated based on the amount of massive data.

[0062] In one specific implementation, calculating the number of groups in step 200 using the number of rows in the left and right tables and the available memory of the current system specifically includes: Number of groups N = ((L+R)*Z) / ((M-(LR)*Z) / G); where the number of groups N is rounded up, L is the number of rows in the left table, R is the number of rows in the right table, M is the memory size used for comparison (i.e., the available memory size), G is the preset number of groups that can be extracted in parallel simultaneously, and Z is the memory size occupied by a single row of data in the comparison algorithm. In one specific implementation, the comparison is terminated when the value of (LR)*Z is close to, equal to, or exceeds the value of M. When the value of (LR)*Z is close to (for example, reaching more than 90% of the value of M), equal to, or exceeds the value of M, it indicates that a large number of inconsistent rows have occurred. In this case, the current scheme cannot be used for comparison, and other schemes should be used. This embodiment of the invention will not describe other schemes for now.

[0063] For the above calculation formula, if an error occurs during the migration process, the number of rows R in the right table will definitely be less than the number of rows L in the left table. This difference in data will definitely remain in memory during the comparison and cannot be eliminated by collision. Therefore, this part of the memory should be reserved when calculating the group. When the amount of memory to be reserved (LR)*Z is close to or exceeds the memory M used for comparison, the comparison is terminated.

[0064] The main function of the above formula is to divide massive amounts of data into N groups, and simultaneously extract G groups of data for comparison based on the available memory. This ensures maximum memory utilization and prevents performance loss due to memory exhaustion. It also uses parallel querying to improve the speed of data retrieval from the database, ensuring that the comparison of each group of data is completed in memory as much as possible, reducing the frequency of data exchange between memory and disk during comparison, thereby improving performance.

[0065] refer to Figure 3 As shown, in a specific implementation, the steps for obtaining the ROWID mapping relationship of each group from the ROWID mapping relationship table by means of the number of rows and the number of groups in the left table in step 300, and constructing the data query conditions for each group based on the ROWID mapping relationship, specifically include the following steps.

[0066] Step 301: Divide the number of rows L in the left table into N groups based on the number of groups N.

[0067] Step 302: Obtain the starting row number of each group.

[0068] Step 303: Find the closest left table row number in the ROWID mapping table by starting row number, and then obtain the ROWID mapping relationship corresponding to each group.

[0069] Step 304: For each group, combine its own ROWID mapping relationship with the ROWID mapping relationships of the adjacent groups above and below to construct the left table conditions and right table conditions for the data query conditions of each group.

[0070] In one specific implementation, when a group has only downward adjacent groups, the left-table condition of the data query for that group is: ROWID is less than or equal to the left-table ROWID in the ROWID mapping relationship of its own group; the right-table condition of the data query for that group is: ROWID is less than or equal to the right-table ROWID in the ROWID mapping relationship of its own group. When a group has both upward and downward adjacent groups, the left-table condition of the data query for that group is: ROWID is greater than the left-table ROWID in the ROWID mapping relationship of the upward adjacent group, and ROWID is less than or equal to the ROWID of its own group. The left table condition for the ROWID mapping relationship is: ROWID is greater than the right table condition for the ROWID mapping relationship of the adjacent group above, and ROWID is less than or equal to the right table condition for the ROWID mapping relationship of its own group. When a group only has an adjacent group above, the left table condition for the data query relationship of that group is: ROWID is greater than the left table condition for the ROWID mapping relationship of the adjacent group above; the right table condition for the data query relationship of that group is: ROWID is greater than the right table condition for the ROWID mapping relationship of the adjacent group above.

[0071] For the above process, the left table is split into N groups using the number of rows L in the left table, and the starting row number of each group is obtained. Then, the closest matching row number from the left table is found in the ROWID mapping table using this starting row number, thus obtaining the ROWID mapping relationship for each group. An example is as follows:

[0072] Grouping Left table row number Left table ROWID Right table ROWID 1 N1 L_RID1 R_RID1 X N2 L_RID2 R_RID2 N N3 L_RID3 R_RID3

[0073] Each group, combined with its adjacent groups above and below, constructs the conditions for the group's query range. Groups without adjacent groups above or below are not assigned closed intervals; only an open interval is used. Examples are as follows:

[0074] Grouping Conditions on the left table Conditions in the right table 1 ROWID<=L_RID1 ROWID<=R_RID1 X L_RID1 <ROWID<=L_RID2 R_RID1<ROWID<=R_RID2 N L_RID2 <ROWID R_RID2<ROWID

[0075] In one specific implementation, the step 400, which involves parallel extraction and comparison of a preset number of grouped data based on the grouped data query conditions, specifically includes: simultaneously creating G groups of connections based on the left and right table conditions of the grouped data query conditions; extracting and comparing the data from the left and right tables of the G groups in group order; eliminating collisions of identical data; and retaining different data in memory; where G is the preset number of groups that can be extracted in parallel. In one specific implementation, for each of the G groups of connections, each group includes two connections, one to the source database and the other to the target database for data extraction. Due to speed differences, the group is marked as completed when both connections have finished extracting data. In this embodiment, the preset value of G needs to be determined based on the current hardware configuration; it is a dynamic value. If there are many CPU cores and sufficient memory, G can be set larger; otherwise, it should be set smaller.

[0076] In one specific implementation, for step 500, after the comparison of a certain group is completed, the comparison of the next group begins sequentially until all group comparisons are completed. Finally, a comparison report is generated based on the data remaining in memory, thus completing the data comparison.

[0077] The above solution primarily addresses data consistency verification after massive data migration between two databases, identifying discrepancies. Traditional methods separate data migration and data comparison into two independent steps. This leads to a lack of basis for data grouping during comparison, insufficient mapping between the left and right tables, and a significant amount of residual data remaining in memory after each comparison, eventually causing memory exhaustion and impacting overall comparison performance. The steps described in this embodiment can be explained as follows:

[0078] First, the migration and comparison of massive amounts of data between the two databases should be carried out in combination. When migrating data, a one-to-one ROWID mapping relationship needs to be generated between the left and right tables every X rows. This provides a basis for the subsequent data comparison and grouping, so that after each comparison is completed, the same data rows can be eliminated as much as possible. This avoids the same data that exists on both sides remaining in the limited memory for a long time, which will directly exhaust the memory and cause a decrease in comparison performance.

[0079] Secondly, when calculating the number of groups N for data comparison, the amount of memory required for known inconsistent data needs to be reserved in advance. Then, the remaining memory is split into G pairs of parallel groups, so that the data comparison can be performed by using multiple connections to extract data concurrently, thereby speeding up the comparison performance. Finally, the number of groups N is calculated based on the total amount of data in the left and right tables. With this scheme, the collision elimination of inconsistent data can be maximized during each group comparison, freeing up enough memory for the next group comparison. This avoids data swapping between memory and disk due to insufficient memory during comparison, thereby improving the overall performance of data comparison.

[0080] Example 2:

[0081] Based on the method for grouping and comparing database data provided in Embodiment 1, this Embodiment 2 will provide a more detailed description of the present invention through a specific application scenario.

[0082] For ease of illustration, the data size of the left table has been simplified. The above scheme is illustrated in the following example:

[0083] The source database has a table T1 (ID INT) with 4 rows of data, distributed as follows:

[0084]

[0085] The data migration process is as follows:

[0086] Move T1 from the left table to the right table T1. Let X be 1. A pair of ROWID mapping relationships needs to be generated for every row.

[0087] When inserting data from the left table into the right table, you can append the syntax RETURN ROWID INTO:RID to the INSERT statement to obtain the ROWID value of the inserted row in the target database. Combining the row number in the left table and the ROWID in the left table, the following mapping relationship can be generated, where B* represents the ROWID of the target database.

[0088] Left table row number Left table ROWID Right table ROWID 1 A1 B1

[0089] During the data migration process, it is necessary to count the number of rows extracted from the left table (L) and the number of rows successfully imported from the right table (R). Suppose that when migrating the 4th row of data from the left table, the target database fails and restarts, causing the migration of that row to fail (failed rows do not have mappings; only successful rows that are exactly X rows away have mappings). After the data migration is completed, the number of rows extracted from the left table (L) is 4, and the number of rows successfully imported from the right table (R) is 3, which is one row short. The final ROWID relationship mapping table is as follows.

[0090]

[0091]

[0092] Assume the current memory M is 6 bytes, each record requires 1 byte of memory for comparison, and the number of parallel comparison groups G is set to 2.

[0093] The number of groups can be obtained using the formula N = ((L+R)*Z) / ((M-(LR)*Z) / G):

[0094] N=((4+3)*1) / ((6-(4-3)*1) / 2)=3.

[0095] Based on the calculated number of comparison groups N(3) and the number of rows in the left table L(4), the corresponding ROWID mapping relationships are located in the ROWID relationship mapping table.

[0096] The query conditions for each group are constructed by mapping adjacent groups; if there are no adjacent groups, no corresponding closed interval is constructed.

[0097] Grouping Conditions on the left table Conditions in the right table 1 ROWID<=A1 ROWID<=B1 2 A1 <ROWID<=A2 B1 <ROWID<=B2 3 A2 <ROWID B2 <ROWID

[0098] Simultaneously running queries for the first and second groups to compare data, 6 bytes of memory can meet the memory requirements of the comparison algorithm, since the amount of data to be compared between the two groups requires 4 bytes of memory.

[0099] When the first group has been compared and the second group is still in progress, the data that matches in the first group will be eliminated by collision, freeing up 2 bytes of memory for that group. In addition to the 2 bytes needed by the currently running second group, there are 4 bytes available for the next group. At this point, the memory is sufficient to start the comparison of the third group.

[0100] After all group comparisons are completed, the record with ROWID A4 from the left table will remain in memory. This record will be generated into the comparison report to complete the comparison.

[0101] In summary, this embodiment provides a method for comparing database data in groups. It compares data after massive amounts of data have been migrated between databases to verify the correctness of the data migration or identify data discrepancies caused by failures during the migration process. Using this approach, each group comparison can eliminate identical data through collision elimination, freeing up memory for the next group comparison. This prevents the situation in traditional methods where data from the previous group remains in memory after comparison, requiring the next group to eliminate it through collision elimination. It also avoids the drawback of memory exhaustion and performance degradation caused by extreme cases.

[0102] In traditional solutions, data migration and data comparison are two independent processes. This leads to a lack of grouping decision-making conditions when using grouped comparison optimization techniques to improve comparison performance. The left and right tables are independently divided into N groups based on their estimated data volume, and there is no one-to-one mapping relationship between the data in the N groups of the two tables. For example, some data in the first group of the left table may not appear in the first group of the right table. After the data extraction and comparison of the first groups of both sides are completed, a large amount of data from the first group of the left table remains in memory. This data needs to be compared and eliminated by querying the second or third group of the right table. This phenomenon results in a large amount of inconsistent data being generated during the comparison process, occupying limited memory resources. Subsequent comparison processes may run out of memory, requiring data pages to be swapped with disk to free up memory for continued comparison, thus reducing the performance of data comparison.

[0103] This invention addresses the aforementioned problems by integrating data migration and data comparison processes. During data migration, ROWID is appended as a query term in the source database. Furthermore, every X rows, the target database's ROWID for that row is retrieved and the mapping is saved. After data migration is complete, K source and target ROWID mappings are obtained. Data comparison no longer requires querying the data size from the source and target databases; comparisons are performed directly based on the mappings saved during migration, grouping data as needed. This mapping-based grouping ensures maximum collision elimination within each group of the left and right tables, reducing memory resource requirements during comparison and minimizing or avoiding data page and disk swapping due to insufficient memory, thus improving comparison performance.

[0104] Example 3:

[0105] Based on the methods for grouping and comparing database data provided in Embodiments 1 and 2 above, the present invention also provides a device for implementing the above methods for grouping and comparing database data, such as... Figure 4 The diagram shown is a schematic representation of the device architecture according to an embodiment of the present invention. The device for grouping and comparing database data in this embodiment includes one or more processors 21 and a memory 22. Figure 4 Take a processor 21 as an example.

[0106] Processor 21 and memory 22 can be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.

[0107] The memory 22, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the method and system for grouped comparison database data in Embodiments 1 and 2. The processor 21 executes various functional applications and data processing of the device for grouped comparison database data by running the non-volatile software programs, instructions, and modules stored in the memory 22, thereby implementing the method for grouped comparison database data in Embodiments 1 and 2.

[0108] Memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 22 may optionally include memory remotely located relative to processor 21, which can be connected to processor 21 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0109] The program instructions / modules are stored in memory 22. When executed by one or more processors 21, they perform the grouping and comparison database data method described in Embodiments 1 and 2 above, for example, performing the above-described method. Figures 1 to 3 The steps shown.

[0110] The above-described product can perform the methods provided in the embodiments of this application, and has the corresponding functional modules and beneficial effects for performing the methods. Technical details not described in detail in this embodiment can be found in the methods provided in the embodiments of this application.

[0111] It should be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0112] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented using software and a general-purpose hardware platform, or of course, using hardware. Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0113] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; under the concept of the present invention, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the present invention as described above. For the sake of brevity, they are not provided in detail; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for comparing data in a grouped database, characterized in that, include: Construct a mapping table between the ROWID values ​​of the left and right tables; The number of groups is calculated using the number of rows in the left and right tables and the available memory in the current system; including: number of groups N = ((L+R)×Z) / ((M-(LR)×Z) / G); where the number of groups N is rounded up, L is the number of rows in the left table, R is the number of rows in the right table, M is the memory size used for comparison, G is the preset number of groups that can be extracted in parallel at the same time, and Z is the memory size occupied by a single row of data in the comparison algorithm; The ROWID mapping relationship for each group is obtained from the ROWID mapping relationship table by the number of rows and the number of groups in the left table, and the data query conditions for each group are constructed based on the ROWID mapping relationship. Based on the data query conditions of the groups, a preset number of grouped data are extracted in parallel for comparison; this includes: based on the left table conditions and right table conditions of the data query conditions of the groups, G group connections are created simultaneously, and data from the left and right tables of the G groups are extracted in the order of the groups for comparison. Collisions of identical data are eliminated, while different data is retained in memory; where G is the preset number of groups that can be extracted in parallel at the same time. Once the data comparison for a certain group is completed, the data for the next group is extracted sequentially for comparison, until all groups are compared. A comparison report is then generated based on the data remaining in memory.

2. The method for grouping and comparing database data according to claim 1, characterized in that, The construction of the ROWID mapping relationship table between the left and right tables specifically includes: The source database data is retrieved by using ROWID as an additional query item, and the source database data carrying ROWID is sent to the target database for data migration; the table in the source database is the left table, and the table in the target database is the right table. For every X rows of data, a ROWID mapping relationship is constructed using the left table row number currently extracted from the left table, the left table ROWID corresponding to that row number, and the right table ROWID returned by the target database for that row of data. After the data migration is completed, K ROWID mapping relationships between left and right tables are formed, that is, the ROWID mapping relationship table between left and right tables is constructed.

3. The method for grouping and comparing database data according to claim 2, characterized in that, When performing data migration, the number of rows migrated out of the left table and the number of rows successfully entered into the right table are counted. The number of rows migrated out of the left table is the same as the number of rows in the left table, and the number of rows successfully entered into the right table is the same as the number of rows in the right table.

4. The method for grouping and comparing database data according to claim 1, characterized in that, The comparison is terminated when the value of (LR)×Z is close to, equal to or exceeds the value of M.

5. The method for grouping and comparing database data according to claim 1, characterized in that, The process of obtaining the ROWID mapping relationship for each group from the ROWID mapping relationship table by using the number of rows and the number of groups in the left table, and constructing the data query conditions for each group based on the ROWID mapping relationship, specifically includes: The number of rows L in the left table is divided into N groups based on the number of groups N; Get the starting row number of each group; By finding the closest left table row number in the ROWID mapping table using the starting row number, the ROWID mapping relationship corresponding to each group can be obtained. Each group combines its own ROWID mapping relationship with the ROWID mapping relationships of the adjacent groups above and below to construct the left table conditions and right table conditions for the data query conditions of each group.

6. The method for grouping and comparing database data according to claim 4, characterized in that, Each group, combining its own ROWID mapping relationship and the ROWID mapping relationships of adjacent groups above and below, constructs the left and right table conditions for the data query conditions of each group, specifically including: When a group has only downward adjacent groups, the left table condition for the data query of that group is: ROWID is less than or equal to the left table ROWID in the ROWID mapping relationship of its own group; the right table condition for the data query of that group is: ROWID is less than or equal to the right table ROWID in the ROWID mapping relationship of its own group. When a group has an upward adjacent group and a downward adjacent group, the left table condition for the data query of this group is: ROWID is greater than the left table ROWID in the ROWID mapping relationship of the upward adjacent group, and ROWID is less than or equal to the left table ROWID in the ROWID mapping relationship of its own group; the right table condition for the data query of this group is: ROWID is greater than the right table ROWID in the ROWID mapping relationship of the upward adjacent group, and ROWID is less than or equal to the right table ROWID in the ROWID mapping relationship of its own group. When a group has only an upwardly adjacent group, the left table condition for the data query of that group is: ROWID is greater than the left table ROWID in the ROWID mapping relationship of the upwardly adjacent group; the right table condition for the data query of that group is: ROWID is greater than the right table ROWID in the ROWID mapping relationship of the upwardly adjacent group.

7. The method for grouping and comparing database data according to claim 6, characterized in that, For the created G-group connections, each group consists of two connections, one to the source database and one to the target database, for data extraction. When both connections have completed data extraction, the group is marked as comparison complete.

8. A device for grouping and comparing database data, characterized in that: The method includes at least one processor and a memory, which are connected via a data bus. The memory stores instructions that can be executed by the at least one processor, which, when executed by the processor, are used to perform a method for comparing grouped database data as described in any one of claims 1-7.