A method, apparatus and storage medium for realizing data comparison
By using a two-level comparison method with dynamically configured comparison rules, the problem of index failure caused by the resource bottleneck of a monolithic database is solved, achieving flexible data comparison and accurate reconciliation results, which is applicable to scenarios such as real estate transaction systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 北京理房通支付科技有限公司
- Filing Date
- 2022-03-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies, when performing data comparisons, are limited by the resource bottleneck of a single database, leading to index failure, impacting performance, and hindering the expansion of business rules. In particular, when the data structure changes, reconciliation rules need to be urgently repaired.
The method of dynamically configuring comparison rules is adopted. By constructing a primary comparison set and a secondary comparison set, two-level comparison is performed using the first key field and the second key field. The source data is automatically loaded and the fields participating in the comparison are extracted, thereby realizing the dynamic configuration of data comparison rules.
It enables flexible reconciliation when data structures change, improves the efficiency and accuracy of data comparison, reduces the need for emergency repairs of reconciliation code, and supports distributed storage and processing.
Smart Images

Figure CN114661716B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data statistics, and in particular to a method, apparatus, computer-readable storage medium, and electronic device for data comparison. Background Technology
[0002] In many scenarios, it's necessary to compare data between different tables, such as financial reconciliation. Each record in a single table may contain different fields, requiring a step-by-step comparison of each field. Currently, most data comparisons utilize Oracle databases' INTERSECT and MINUS features to perform intersection and difference operations on different tables. However, this approach is limited by the resource constraints of monolithic databases. Furthermore, processing field functions across different tables can cause index failures, impacting performance and hindering the expansion of business rules.
[0003] The real estate transaction system also involves data comparison between different subsystems, such as the transaction subsystem and the channel subsystem. Currently, data from the transaction and channel subsystems is sent to the reconciliation platform via a message queue, and the reconciliation platform implements the data comparison logic one by one according to product requirements. However, due to changes in requirements, the channel data structure may be adjusted, and the fields to be compared may also change. This could cause the reconciliation rules set for the previous data structure to become invalid, necessitating an urgent fix to the reconciliation code. Summary of the Invention
[0004] In view of the above-mentioned prior art, this application discloses a method for implementing data comparison, which can dynamically configure comparison rules and conveniently implement data comparison.
[0005] A method for implementing data comparison includes:
[0006] Each of the two sets of source data to be compared is written into the first data structure of the database as a source data form; wherein, each set of source data includes several data entries, and each data entry includes field values corresponding to several fields;
[0007] For each set of source data forms, determine and construct the first data using the first key field, which serves as the first comparison data for the corresponding set of source data; for each first comparison data, extract all the values of the first key field to form a set, which serves as the first comparison set for the corresponding set of source data;
[0008] For the two sets of source data, the two sets of comparisons are compared to obtain the balanced set and the questionable set;
[0009] For each set of source data forms, identify and construct second data using the second key field, which serves as the secondary comparison set for the corresponding set of source data;
[0010] Extract the field values of the second key field from the suspicious set to form the secondary comparison key field values of the suspicious set;
[0011] The key field values of the secondary comparison are compared with the secondary comparison set, and normal data forms and abnormal data forms are constructed based on the comparison results.
[0012] Optionally, the first key field is all the fields that participate in a comparison in a pre-defined manner, and the field values of all the first key fields in a single data in each source data form are concatenated together as the first key field value of the comparison data.
[0013] And / or,
[0014] The second key field is all the pre-defined fields that participate in the secondary comparison. All the second key fields in a single data entry in each set of source data forms are concatenated together to form the second key field value of the secondary comparison set.
[0015] Optionally, the balanced set includes identical first key field values from two sets of source data in a single comparison set; the questionable set includes a set of multiple accounts and a set of fewer accounts, the set of multiple accounts includes a first key field value included in the first set of source data in a single comparison set but not included in the first set of source data in a single comparison set, and the set of fewer accounts includes a first key field value not included in the first set of source data in a single comparison set but included in the first set of source data in a single comparison set.
[0016] Optionally, comparing the secondary comparison key field values with the secondary comparison set, and constructing normal data forms and abnormal data forms based on the comparison results, includes:
[0017] The key field values of the secondary comparison of the multi-account set and the key field values of the secondary comparison of the missing account set are compared with the secondary comparison sets of the two sets of source data respectively to determine the mismatched key field values included in the multi-account set and the missing account set, and the mismatched key field values are deleted from the multi-account set and the missing account set to form a mismatch set.
[0018] The system iterates through the sets of multiple accounts, missing accounts, balanced accounts, and mismatched accounts, finds the values corresponding to each key field value in a single comparison of the two sets of source data, and constructs corresponding multiple account forms, missing account forms, balanced account forms, and mismatched forms as data comparison results. The key field values and their corresponding values are stored in pairs. The normal data form includes the balanced account form, and the abnormal data form includes the multiple account form, the missing account form, and the mismatched form.
[0019] Optionally, determining the mismatched key field values included in the multiple-account set and the missing-account set includes:
[0020] In the second set of source data, find the key field value that overlaps with the key field value of the second comparison of the multi-account set, and take the key field value in the multi-account set corresponding to the found key field value as the mismatched key field value;
[0021] In the second comparison set of the first set of source data, find the key field value that overlaps with the key field value of the second comparison of the missing accounts set, and use the key field value in the missing accounts set corresponding to the found key field value as the mismatched key field value.
[0022] Optionally, the second key field is included in the first key field but is not exactly the same as the first key field;
[0023] And / or,
[0024] The first data and the second data are constructed from two different data structures in the database.
[0025] An apparatus for data comparison includes a data loading unit, a primary comparison set construction unit, a primary comparison unit, a secondary comparison set construction unit, and a secondary comparison unit.
[0026] The data loading unit is used to write each of the two sets of source data to be compared into the first data structure of the database as a source data form; wherein, each set of source data includes several data entries, and each data entry includes field values corresponding to several fields;
[0027] The first comparison set construction unit is used to determine and construct first data using the first key field for each group of source data forms, as the first comparison data of the corresponding group of source data; for each first comparison data, extract all the values of the first key field to form a set, as the first comparison set of the corresponding group of source data;
[0028] The first comparison unit is used to compare two sets of source data to obtain a balanced set and a questionable set.
[0029] The secondary comparison set construction unit is used to determine and construct second data using the second key field, as the secondary comparison set of the corresponding group source data;
[0030] The secondary comparison unit is used to extract the field values of the second key field from the suspicious set to form the secondary comparison key field values of the suspicious set; compare the secondary comparison key field values with the secondary comparison set, and construct normal data forms and abnormal data forms based on the comparison results.
[0031] Optionally, the first key field is all the fields that participate in a comparison in a pre-defined manner. In the comparison set construction unit, the field values of all the first key fields in a single data in each source data form are concatenated together as the first key field value of the comparison data.
[0032] And / or,
[0033] The second key field is all the fields that participate in the secondary comparison in a pre-defined manner. In the secondary comparison set construction unit, all the second key fields in a single data entry in each group of source data forms are concatenated together as the second key field value of the secondary comparison set.
[0034] Optionally, the balanced set includes identical first key field values from two sets of source data in a single comparison set; the questionable set includes a set of multiple accounts and a set of fewer accounts, the set of multiple accounts includes a first key field value included in the first set of source data in a single comparison set but not included in the first set of source data in a single comparison set, and the set of fewer accounts includes a first key field value not included in the first set of source data in a single comparison set but included in the first set of source data in a single comparison set.
[0035] Optionally, in the secondary comparison unit, comparing the secondary comparison key field values with the secondary comparison set, and constructing normal data forms and abnormal data forms based on the comparison results, includes:
[0036] The key field values of the secondary comparison of the multi-account set and the key field values of the secondary comparison of the missing account set are compared with the secondary comparison sets of the two sets of source data respectively to determine the mismatched key field values included in the multi-account set and the missing account set, and the mismatched key field values are deleted from the multi-account set and the missing account set to form a mismatch set.
[0037] The system iterates through the sets of multiple accounts, missing accounts, balanced accounts, and mismatched accounts, finds the values corresponding to each key field value in a single comparison of the two sets of source data, and constructs corresponding multiple account forms, missing account forms, balanced account forms, and mismatched forms as data comparison results. The key field values and their corresponding values are stored in pairs. The normal data form includes the balanced account form, and the abnormal data form includes the multiple account form, the missing account form, and the mismatched form.
[0038] Optionally, in the secondary comparison unit, the determination of the mismatched key field values included in the sets of multiple accounts and the set of missing accounts includes:
[0039] In the second set of source data, find the key field value that overlaps with the key field value of the second comparison of the multi-account set, and take the key field value in the multi-account set corresponding to the found key field value as the mismatched key field value;
[0040] In the second comparison set of the first set of source data, find the key field value that overlaps with the key field value of the second comparison of the missing accounts set, and use the key field value in the missing accounts set corresponding to the found key field value as the mismatched key field value.
[0041] Optionally, the second key field is included in the first key field but is not exactly the same as the first key field;
[0042] And / or,
[0043] The first data and the second data are constructed from two different data structures in the database.
[0044] Alternatively, the device can be implemented using a distributed Redis cluster.
[0045] A computer program product includes a computer program / instructions that, when executed by a processor, implement the method for data comparison as described above.
[0046] A computer-readable storage medium having computer instructions stored thereon, which, when executed by a processor, enable the data comparison method described above.
[0047] An electronic device includes at least a computer-readable storage medium as described above, and also includes a processor;
[0048] The processor is configured to read the executable instructions from the computer-readable storage medium and execute the instructions to implement the above-described method for data comparison.
[0049] In the above technical solution, the first data structure of the database is used to load two sets of source data forms for comparison. Then, the field values and field names of the first key field are extracted to construct primary comparison data and a primary comparison set. The primary comparison sets corresponding to the two source data lists are compared to obtain a balanced set and a questionable set. Next, a second key field is extracted from the source data forms to construct a secondary comparison set. The field values of the second key field are extracted from the questionable set to form the secondary comparison key field values for the questionable set. The secondary comparison key field values are compared with the secondary comparison set, and normal data forms and abnormal data forms are constructed based on the comparison results. Through this process, source data can be automatically loaded, and the fields participating in the comparison can be automatically extracted according to the comparison rules. Data comparison is completed in two levels, thereby achieving dynamic configuration of the comparison rules and facilitating data comparison. Attached Figure Description
[0050] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0051] Figure 1 This is a schematic diagram of the basic process of implementing the data comparison method in this application;
[0052] Figure 2 This is a flowchart illustrating the specific implementation of the data comparison method in the embodiments of this application;
[0053] Figure 3 This is a schematic diagram illustrating the classification of data results achieved after applying the data comparison method of this application;
[0054] Figure 4 This is a schematic diagram illustrating the user interface and functionalities of the data comparison method implemented in this application;
[0055] Figure 5 This is a schematic diagram of the basic structure of the data comparison device implemented in this application;
[0056] Figure 6 This is a schematic diagram of the basic structure of the electronic device provided in this application. Detailed Implementation
[0057] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0058] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0059] The technical solution of the present invention will be described in detail below with reference to specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.
[0060] Figure 1 This is a schematic diagram illustrating the basic process of the data comparison method implemented in this application. For example... Figure 1 As shown, the method includes:
[0061] Step 101: Write each of the two sets of source data to be compared into the first data structure of the database as a source data form.
[0062] Each set of source data includes several data entries, and each data entry includes field values corresponding to several fields. The database can be any existing database, such as Redis, or other databases containing List, Map, and Set data structures. The first data structure can be a form-based data structure like a List.
[0063] Step 102: For each set of source data forms, determine and construct the first data using the first key field, as the first comparison data of the corresponding set of source data; for each first comparison data, extract all the values of the first key field to form a set, as the first comparison set of the corresponding set of source data.
[0064] In the data form, fields include field names and field values. For example, a field named "date" might have the value "2021-06-01". The first key field refers to all pre-defined fields participating in a single comparison. The initial data can be structured into a specified data structure, such as a Map. Map data includes keys and values.
[0065] For a single comparison of data, the process of extracting all the values of the first key field can be as follows: concatenate the values of all the first key fields in a single data entry from each source data form to obtain the first key field value of the comparison data.
[0066] Step 103: For the two sets of source data, compare the two sets of primary comparisons to obtain the balanced set and the questionable set.
[0067] The "balanced set" includes identical first key field values from a single comparison of the two sets of source data. The "questionable set" refers to the set of data that may be inconsistent after a single comparison, specifically including the "multiple accounts set" and the "short accounts set." The "multiple accounts set" includes first key field values that are included in the first comparison of the first set of source data but not in the first comparison of the second set of source data. The "short accounts set" includes first key field values that are not included in the first comparison of the first set of source data but are included in the first comparison of the second set of source data.
[0068] This application uses the terms "balanced accounts," "excess accounts," and "insufficient accounts" to describe various situations where data comparisons are consistent or inconsistent. This is not only for reconciliation applications but can also be used in various scenarios where data comparisons are conducted.
[0069] Step 104: For each set of source data forms, determine and construct second data using the second key field, as the secondary comparison set of the corresponding set of source data.
[0070] The second key field can be any pre-defined field participating in the secondary comparison. It can be included in the first key field but is not identical to it. The second data can be a specified data structure, such as a Set data structure. The first and second data are constructed from two different data structures in the database. All the second key fields from a single data entry in each source data form are concatenated together to form the second key field value of the secondary comparison set.
[0071] Step 105: Extract the field values of the second key field from the suspicious set to form the second comparison key field values of the suspicious set.
[0072] Step 106: Compare the key field values of the secondary comparison with the secondary comparison set, and construct normal data forms and abnormal data forms based on the comparison results.
[0073] The process of constructing normal and abnormal data forms may include:
[0074] The key field values of the secondary comparison of the multi-account set and the key field values of the secondary comparison of the missing account set are compared with the secondary comparison sets of the two sets of source data respectively to determine the mismatched key field values included in the multi-account set and the missing account set. The mismatched key field values are then deleted from the multi-account set and the missing account set to form a mismatch set.
[0075] Iterate through the sets of multiple accounts, missing accounts, balanced accounts, and mismatched accounts. In a single comparison of the two sets of source data, find the values corresponding to each key field, and construct corresponding multiple account, missing account, balanced account, and mismatched account forms as the data comparison results. Key field values are stored in pairs with their corresponding values. Normal data forms include balanced account forms, while abnormal data forms include multiple account, missing account, and mismatched account forms.
[0076] Here, the structure of a single comparison data (i.e., the first data) consists of two parts: the key field value and the corresponding value. For example, the first data can be a Map data, which includes two parts: key and value. The key is the key field value, and the value is the corresponding value.
[0077] At this point, Figure 1 The process shown is now complete. The following detailed embodiment illustrates the processing of this application. Figure 2 This is a schematic diagram illustrating the specific process of the data comparison method in this application embodiment, where the application of various data structures (List, Map, Set, etc.) in the Redis database is used as an example for explanation.
[0078] Step 201: Obtain the two sets of source data to be compared and write them into the Redis data structure List as the source data List.
[0079] In this embodiment, a Redis database is used. The data source form is represented by a List data structure.
[0080] First, two data sources are loaded, such as a database table, a CSV file, or a batch data query interface. Then, two sets of source data to be compared are obtained. Each set of source data includes several data entries, and each data entry includes field values corresponding to several fields. In this embodiment, to distinguish between the two sets of source data, they are referred to as the first set of source data and the second set of source data, respectively.
[0081] For each set of source data, it is written into a Redis data structure List as the source data List. Here, to distinguish between the two sets of source data Lists, the List written to the first set of source data is called the first source data List, and the List written to the second set of source data is called the second source data List.
[0082] Below is a simple example. Assume the first data source list is Source1: [{key1=a1,amount=100,time='2021-06-01'},{key1=a2,amount=100,time='2021-06-01'},{key1=a3,amount=102,time='2021-06-01'}], and the second data source list is Source2: [{key2=a1,amount=100,date='2021-06-01'},{key2=a2,amount=101,date='2021-06-01'},{key2=a4,amount=102,date='2021-06-01'}]. Here, "key1", "amount", "time", and "date" are field names, and "a1", "100", "2021-06-01", etc., are field values.
[0083] Step 202: For each set of source data, according to the pre-defined fields participating in the comparison, construct Map data using the field names and values of all fields participating in the comparison in each data in the corresponding source data List, based on the Map data structure in Redis, as a comparison Map for the corresponding set of source data.
[0084] This application utilizes a two-stage alignment process to compare two sets of source data. This step is used to prepare data for the first-stage alignment.
[0085] Specifically, through the aforementioned step 201, the data to be compared has been constructed into a Redis List structure. Next, for each group of source data in the List structure, it is parsed according to a comparison rule, and the first data in Redis is constructed using the field names and values of all fields involved in the comparison. In this embodiment, the first data adopts a Map data structure. The processing of the first source data List and the second source data List is the same, and the following explanation uses the first source data List as an example.
[0086] A single comparison rule refers to the field information that is compared during a single comparison. In fact, the fields participating in a single comparison are all the fields that are pre-defined for comparison in the entire data comparison. Specifically, the field values of all the pre-defined fields participating in comparison (e.g., system source, transaction number, transaction date, transaction amount) in a single data entry of the first source data List are extracted and concatenated together as the key value of the Map. The field name and field value of the corresponding fields are used to form the value corresponding to the key value. For each data entry, the key value and value of the Map are extracted in the above manner to form a comparison Map. Taking the aforementioned first source data List as an example, Source1: [{key1=a1,amount=100,time='2021-06-01'},{key1=a2,amount=100,time='2021-06-01'},{key1=a3,amount=102,time='2021-06-01'}], then the corresponding first comparison Map of the source data is Map1, and its structure is as follows:<key1+amount+time:value> Specifically, in one comparison, Map1 takes the values {"a1-100-2021-06-01":{key1=a1,amount=100,time='2021-06-01'}, "a2-100-2021-06-01":{key1=a2,amount=100,time='2021-06-01'}, and "a3-102-2021-06-01":{key1=a3,amount=102,time='2021-06-01'}}. The second source data list is Source2: [{key2=a1,amount=100,date='2021-06-01'},{key2=a2,amount=101,date='2021-06-01'},{key2=a4,amount=102,date='2021-06-01'}]. The corresponding first comparison map is Map2, with the following structure:<key2+amount+date> The values are {"a1-100-2021-06-01":{key2=a1,amount=100,date='2021-6-1'},"a2-101-2021-06-01":{key2=a2,amount=101,date='2021-06-01'},"a4-102-2021-06-01":{key2=a4,amount=102,date='2021-06-01'}}.
[0087] Step 203: For each comparison Map, extract all Key values to form a Set, which serves as the first comparison Set of the corresponding group of source data.
[0088] For each comparison Map formed in step 202, all Key values are extracted to form a comparison set. In this embodiment, the comparison set adopts the Redis data structure Set and is called the comparison set of the corresponding group source data.
[0089] Continuing with the previous example, the first comparison Map of the first set of source data is Map1: {"a1-100-2021-06-01":{key1=a1,amount=100,time='2021-06-01'}, "a2-100-2021-06-01":{key1=a2,amount=100,time='2021-06-01'}, "a3-102-2021-06-01":{key1=a3,amount=102,time='2021-06-01'}}. Therefore, the first comparison Set of the first set of source data is Set1, with the following structure:<key1+amount+time> Set1 takes the values {"a1-100-2021-06-01", "a2-100-2021-06-01", "a3-102-2021-06-01"}; the first comparison map of the second set of source data is Map2: {"a1-100-2021-06-01":{key2=a1,amount=100,date='2021-6-1'},"a2-101-2021-06-01":{key2=a2,amount=101,date='2021-06-01'},"a4-102-2021-06-01":{key2=a4,amount=102,date='2021-06-01'}}. Therefore, the first comparison Set of the second set of source data is Set2, and its structure is as follows:<key2+amount+date> Set2 takes values of {"a1-100-2021-06-01", "a2-101-2021-06-01", "a4-102-2021-06-01"}.
[0090] Step 204: Compare the two sets of source data in a single comparison set to determine the set with extra accounts, the set with missing accounts, and the set with equal accounts.
[0091] The set of equal accounts includes key values that are completely identical in the first comparison set of the two sets of source data. The set of multiple accounts includes key values that are included in the first comparison set of the first set of source data but not included in the first comparison set of the second set of source data. The set of fewer accounts includes key values that are not included in the first comparison set of the first set of source data but included in the first comparison set of the second set of source data.
[0092] Since step 203 yields a Redis data structure called Set, Redis operations can be directly used to compare the two Sets. In this comparison process, the Key values of Map1 and Map2 are compared as a whole. Each Key value includes the values of all fields involved in the comparison for a single data entry. Therefore, if any field value differs between the two Key values, they are considered different, and the Key is added to the set with more or fewer entries.
[0093] Optionally, you can use Redis's built-in functions `sinterstore` and `sdiffstore` to obtain a set with multiple accounts, a set with fewer accounts, and a set with equal accounts based on the first set of source data. Here, a set with multiple accounts refers to the data information that is additional to the first set of source data compared to the second set of source data; a set with fewer accounts refers to the data information that is missing from the first set of source data compared to the second set of source data. For example, based on the previous example, a comparison yields a set with multiple accounts: {"a2-100-2021-06-01", "a3-102-2021-06-01"} and a set with fewer accounts: {"a4-102-2021-06-01"}.
[0094] This step allows us to initially determine the differences between the first and second sets of source data. Next, a second comparison is used to obtain the precise differences between the two sets of data.
[0095] Specifically, in the first comparison process, the key values of Map1 and Map2 are compared as a whole. Each key value includes the values of all fields involved in the comparison for a single data entry. Therefore, if even one field value differs between two key values, they are considered different and added to the multiple-account set or the missing-account set. However, due to inaccurate field values in some data sets, two data entries from two sets might refer to the same data entry, but due to an error in one set of source data, some field values in that data entry are incorrect. In this case, the two data entries are actually mismatched, not an extra or missing data entry. To address this, a key field is specified in the data to identify a data entry. If the key field values are the same, they represent the same data entry. The purpose of the second comparison is to find the single data entry with identical key field values from the multiple-account set and the missing-account set obtained after the first comparison, and determine it as a mismatched data entry.
[0096] Step 205: For each set of source data, corresponding to the pre-defined key fields, construct a Set using the field values of all key fields of each data in the corresponding data source list, according to the Set data structure in Redis, as a secondary comparison Set for the corresponding set of source data.
[0097] This step is used to prepare data for the second-level alignment. Therefore, this step only needs to be completed before the second alignment. It can be performed in parallel or in any order with the data preparation and specific comparison operations of the first alignment (i.e., steps 202-204).
[0098] Specifically, through the aforementioned step 201, the data to be compared has been constructed into a Redis List structure. Next, for each group of source data in the List structure, it is parsed according to the secondary comparison rules, and the second data is constructed using the field values of key fields. In this embodiment, the second data adopts the Redis data structure Set. The processing of the first source data List and the second source data List is the same. The following explanation uses the first source data List as an example.
[0099] The secondary comparison rule represents the key field information of a single data entry. Specifically, the field values of pre-defined key fields (such as system source and transaction number) in a single data entry of the first source data List are extracted and concatenated to form the key value of the Set. For each data entry, the field values of the key fields are extracted in the same way to form the secondary comparison Set. Taking the aforementioned first source data List as an example, Source1: [{key1=a1,amount=100,time='2021-06-01'},{key1=a2,amount=100,time='2021-06-01'},{key1=a3,amount=102,time='2021-06-01'}], assuming the key field is key1, then the corresponding secondary comparison Set of the source data is Set1', with the following structure: <key1>The values are {"a1", "a2", "a3"}. The second source data list is Source2: [{key2=a1,amount=100,date='2021-06-01'},{key2=a2,amount=101,date='2021-06-01'},{key2=a4,amount=102,date='2021-06-01'}]. Assuming the key field is key2, the corresponding secondary comparison Set of the source data is Set2', and its structure is as follows: <key1>The value can be {"a1", "a2", "a4"}.
[0100] Step 206: Extract key field values from each key value in the multi-account Set and the few-account Set respectively to form the secondary comparison key for the multi-account Set and the few-account Set.
[0101] The comparison in step 204 above yields the suspicious sets, namely the set with multiple accounts and the set with fewer accounts. These two sets represent the general differences between the two sets of source data and require a second comparison for finer distinction. Therefore, this step requires further processing of the set with multiple accounts and the set with fewer accounts for the second comparison.
[0102] Specifically, the multi-account set and the missing-account set include the field values of each comparison field of the corresponding data that distinguishes the first set of source data from the second set of source data, including the field values of key fields used for secondary comparison. This step involves extracting the field values of key fields from the multi-account set to form the secondary comparison key for the multi-account set, and extracting the field values of key fields from the missing-account set to form the secondary comparison key for the missing-account set. For example, the first comparison in step 204 above yields a multi-account Set: {"a2-100-2021-06-01", "a3-102-2021-06-01"}. The field values of the key fields extracted from this set constitute the second comparison key (a2, a3) for the multi-account Set. The first comparison in step 204 above yields a few-account Set: {"a4-102-2021-06-01", "a2-101-2021-06-01"}. The field values of the key fields extracted from this set constitute the second comparison key (a4, a2) for the few-account Set.
[0103] Step 207: Compare the secondary comparison key of the multi-account set and the secondary comparison key of the short-account set with the secondary comparison sets of the two sets of source data, determine the mismatched key values included in the multi-account set and the short-account set, delete the mismatched key values from the multi-account set and the short-account set, and form a mismatched set from the mismatched key values.
[0104] This step involves a second comparison. As mentioned earlier, the second comparison is used to find individual data entries that do not match in the multi-account set and the missing-account set, which means searching based on the values of key fields.
[0105] Specifically, in the secondary comparison set of the second set of source data, the key values that overlap with the secondary comparison key of the multi-account set are found. The key values found corresponding to the key values in the multi-account set are designated as mismatched key values. In other words, for those data entries that were deemed not to exist in the second set of source data in the first comparison, the field values of the key fields are searched in the second set of source data. If data entries with the same field values of the key fields are found in the second set of source data, it means that the data entries are not excluded from the second set of source data, but rather that some field values are inconsistent. This data entry is identified as mismatched data, and the corresponding mismatched key value is deleted from the multi-account set and added to the mismatched set. For example, the secondary comparison key (a2, a3) of the multi-account set is searched for existence in the secondary comparison set of the second set of source data (i.e., {"a1", "a2", "a4"}). If a2 exists, then a2 corresponds to the key value "a2-100-2021-06-01" in the multi-account set, which is a mismatched key value. This key value is deleted from the multi-account set and added to the mismatched set.
[0106] In the secondary comparison Set of the first set of source data, find the key value that overlaps with the secondary comparison key of the missing account Set, and use the found key value as the mismatch key value; that is, for those data that were considered not to exist in the first set of source data in the first comparison, use the field value of the key field to search in the first set of source data. If data with the same field value of the key field is found in the first set of source data, it means that the data is not not included in the first set of source data, but that some field values are inconsistent. The data is identified as mismatch data, the corresponding mismatch key value is deleted from the missing account Set, and added to the mismatch set. For example, if the secondary comparison key (a4, a2) of the missing account set is checked in the secondary comparison set of the first set of source data (i.e., {"a1", "a2", "a3"}), it is found that a2 exists. Therefore, the key value "a2-100-2021-06-01" in the missing account set is a mismatched key value and is deleted from the missing account set. At the same time, it is found that this key value already exists in the mismatch set, so this mismatched key value does not need to be added to the mismatch set.
[0107] The above processing updated the multi-account set and the short-account set, and added a mismatch set. Continuing with the previous example, the updated multi-account set is {"a3-102-2021-06-01"}, the updated short-account set is {"a4-102-2021-06-01"}, and the newly added mismatch set is {a2-100-2021-06-01}. This result represents the precise difference between the two sets of source data.
[0108] Step 208: Iterate through the Set of Multiple Accounts, Set of Missing Accounts, Set of Balanced Accounts, and Set of Mismatches. Find the value corresponding to each key value in the Map of the first comparison of the two sets of source data, and construct the corresponding List of Multiple Accounts, List of Missing Accounts, List of Balanced Accounts, and List of Mismatches as the data comparison results.
[0109] Through the aforementioned steps 201-207, the differences between the two sets of source data have been determined, but this is presented as a Set data structure composed of the field values of the fields involved in the comparison. In this step, the corresponding Set is used to provide the correspondence between field names and field values based on a comparison Map of the two sets of source data, and a corresponding List is constructed.
[0110] Specifically, based on the multi-account Set, the value corresponding to each key value is found in the first comparison Map1 of the first set of source data, that is, the value of each field and its corresponding field name, and a multi-account List is constructed. Specifically, the multi-account List can include only the fields involved in the first comparison Map1, that is, all the fields of the first set of source data that participated in the comparison, or it can include all the fields of the first set of source data.
[0111] Based on the missing accounts Set, find the value corresponding to each key value in the first comparison Map2 of the second set of source data, that is, the value of each field and its corresponding field name, and construct the missing accounts List. Specifically, the missing accounts List can include only the fields involved in the first comparison Map2, that is, all the fields of the second set of source data that participated in the comparison, or it can include all the fields of the second set of source data.
[0112] Based on the balance set, find the value corresponding to each key value in either Map1 (the first comparison map of the first set of source data) or Map2 (the first comparison map of the second set of source data), that is, the value of each field and its corresponding field name, and construct the balance list. Specifically, the balance list can include only the fields involved in the first comparison map1 or Map2, that is, all the fields involved in the comparison of the first or second set of source data, or it can include all the fields of the first or second set of source data.
[0113] Based on the mismatch set, find the value corresponding to each key value in either Map1 (the first comparison map of the first set of source data) or Map2 (the first comparison map of the second set of source data), that is, the value of each field and its corresponding field name, and construct a mismatch list. Specifically, the mismatch list can include only the fields involved in the first comparison map1 or Map2, that is, all the fields involved in the comparison of the first or second set of source data, or it can include all the fields of the first or second set of source data.
[0114] At this point, we can identify the differences between the two sets of source data and provide the fields and their values. Figure 2 The process shown has ended.
[0115] The method described in this application provides a general data comparison approach that allows for flexible setting of reconciliation rules. Accordingly, modifications are made when obtaining the field values of the fields involved in the comparison during the first comparison and when obtaining the field values of the key fields during the second comparison. This facilitates data comparison and the generation of data result classifications, such as... Figure 3 As shown. Furthermore, the above method can be implemented using a Redis cluster, allowing data to be stored and processed in a distributed manner.
[0116] In addition, to facilitate user interaction, a user interface can be provided to allow users to configure data comparison settings, such as selecting, loading, and acquiring source data. Further processing of the data comparison results can also be performed, such as generating reconciliation reports and postponing mismatched data. Specific user interfaces and implemented functions can be as follows: Figure 4 As shown.
[0117] This application also provides an apparatus for implementing data comparison, which can be used to implement the above-described data comparison method. Figure 5 This is a schematic diagram of the basic structure of the device, as shown below. Figure 5 As shown, the device includes a data loading unit, a primary alignment set construction unit, a primary alignment unit, a secondary alignment set construction unit, and a secondary alignment unit.
[0118] The data loading unit is used to write each of the two sets of source data to be compared into the first data structure of the database as a source data form; each set of source data includes several data entries, and each data entry includes field values corresponding to several fields.
[0119] A primary comparison set construction unit is used to determine and construct primary data for each set of source data forms using the first key field, serving as the primary comparison data for that set of source data. For each primary comparison data set, all values of the first key field are extracted to form a set, which serves as the primary comparison set for that set of source data. A primary comparison unit is used to compare two primary comparison sets for two sets of source data to obtain a balanced set and a questionable set.
[0120] The secondary comparison set construction unit is used to determine and construct the second data using the second key field, which serves as the secondary comparison set for the corresponding group of source data. The secondary comparison unit is used to extract the field values of the second key field from the suspicious set, forming the secondary comparison key field values for the suspicious set; it then compares the secondary comparison key field values with the secondary comparison set, and constructs normal data forms and abnormal data forms based on the comparison results.
[0121] Optionally, the first key field can be all the fields that participate in a comparison in a pre-defined manner. In a comparison set construction unit, the field values of all the first key fields in a single data entry in each source data form are concatenated together as the first key field value of the comparison data.
[0122] And / or,
[0123] The second key field can be any of the pre-defined fields that participate in the secondary comparison. In the secondary comparison set construction unit, all the second key fields in a single data entry from each source data form are concatenated together to form the second key field value of the secondary comparison set.
[0124] Optionally, the balancing set may include identical first key field values from a single comparison set of the two sets of source data. The questionable set may include a set of multiple accounts and a set of fewer accounts, wherein the set of multiple accounts may include first key field values included in a single comparison set of the first set of source data but not included in a single comparison set of the second set of source data, and the set of fewer accounts may include first key field values not included in a single comparison set of the first set of source data but included in a single comparison set of the second set of source data.
[0125] Optionally, in the secondary comparison unit, the process of comparing the key field values of the secondary comparison with the secondary comparison set and constructing normal and abnormal data forms based on the comparison results may specifically include:
[0126] The key field values for the secondary comparison of the multi-account set and the short-account set are compared with the secondary comparison sets of the two sets of source data, respectively. The mismatched key field values included in the multi-account set and the short-account set are identified and removed from these sets, forming a mismatch set. The multi-account set, short-account set, balanced set, and mismatch set are traversed, and the values corresponding to each key field value are found in the primary comparison data of the two sets of source data. Corresponding multi-account form, short-account form, balanced form, and mismatch form are constructed as the data comparison results. Key field values are stored in pairs with their corresponding values. Normal data forms include balanced forms, and abnormal data forms include multi-account forms, short-account forms, and mismatch forms.
[0127] Optionally, in the secondary comparison unit, the mismatched key field values included in the multiple-account set and the missing-account set are determined to be:
[0128] In the second set of source data for secondary comparison, find the key field values that overlap with the key field values in the secondary comparison of the multi-account set, and use the key field values in the multi-account set that correspond to the found key field values as the non-matching key field values;
[0129] In the secondary comparison set of the first set of source data, find the key field values that overlap with the key field values of the secondary comparison set of missing accounts, and take the key field values in the missing accounts set that correspond to the key field values found as the non-matching key field values.
[0130] Optionally, the second key field is contained within the first key field but is not exactly the same as the first key field; and / or, the first data and the second data are constructed from two different data structures in the database.
[0131] Alternatively, this device can be implemented using a distributed Redis cluster.
[0132] This application also provides a computer program product, including a computer program / instruction, characterized in that the computer program / instruction, when executed by a processor, implements the above-mentioned data comparison method.
[0133] This application also provides a computer-readable storage medium that stores instructions. When executed by a processor, these instructions can perform the steps in the method for providing mock services as described above. In practical applications, the computer-readable medium may be included in the devices / apparatus / systems described above, or it may exist independently and not assembled into the device / apparatus / system. The instructions stored in the computer-readable storage medium, when executed by a processor, can perform the steps in the data comparison method as described above.
[0134] According to the embodiments disclosed in this application, the computer-readable storage medium can be a non-volatile computer-readable storage medium, such as including but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof, but not intended to limit the scope of protection of this application. In the embodiments disclosed in this application, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
[0135] like Figure 6 As shown, embodiments of the present invention also provide an electronic device. For example... Figure 6 As shown, it illustrates a structural schematic diagram of the electronic device involved in an embodiment of the present invention, specifically:
[0136] The electronic device may include a processor 601 with one or more processing cores, a memory 602 with one or more computer-readable storage media, and a computer program stored in the memory and executable on the processor. When the program in the memory 602 is executed, a data comparison method can be implemented.
[0137] Specifically, in practical applications, this electronic device may also include components such as a power supply 603 and an input / output unit 604. Those skilled in the art will understand that... Figure 6 The structure of the electronic device shown does not constitute a limitation on the electronic device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:
[0138] The processor 601 is the control center of the electronic device. It connects various parts of the electronic device through various interfaces and lines. By running or executing software programs and / or modules stored in the memory 602, and calling data stored in the memory 602, it performs various functions of the server and processes data, thereby monitoring the electronic device as a whole.
[0139] Memory 602 can be used to store software programs and modules, i.e., the aforementioned computer-readable storage medium. Processor 601 executes various functional applications and data processing by running the software programs and modules stored in memory 602. Memory 602 may primarily include a program storage area and a data storage area, wherein the program storage area may store the operating system, at least one application program required for a function, etc.; the data storage area may store data created according to the use of the server, etc. In addition, memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 602 may also include a memory controller to provide processor 601 with access to memory 602.
[0140] The electronic device also includes a power supply 603 that supplies power to the various components. This power supply can be logically connected to the processor 601 via a power management system, enabling functions such as charging, discharging, and power consumption management. The power supply 603 may also include one or more DC or AC power supplies, a recharging system, a power fault detection circuit, a power converter or inverter, a power status indicator, or any other components.
[0141] The electronic device may also include an input / output unit 604, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit output 604 can also be used to display information input by the user or information provided to the user, as well as various graphical user interfaces, which can be composed of graphics, text, icons, video, and any combination thereof.
[0142] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments disclosed in this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those shown in the drawings. For example, two blocks shown connectedly may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0143] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this disclosure can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this application. In particular, without departing from the spirit and teachings of this application, the features described in the various embodiments and / or claims of this application can be combined and / or combined in various ways, and all such combinations and / or combinations fall within the scope of this application.
[0144] This document uses specific embodiments to illustrate the principles and implementation methods of the present invention. The descriptions of these embodiments are merely illustrative of the method and core concepts of the present invention and are not intended to limit this application. Those skilled in the art can make changes to the specific implementation methods and application scope based on the ideas, spirit, and principles of the present invention. Any modifications, equivalent substitutions, or improvements made should be included within the scope of protection of this application.
Claims
1. A method for data comparison, characterized in that, The method includes: Each of the two sets of source data to be compared is written into the first data structure of the database as a source data form; each set of source data includes several data entries, and each data entry includes field values corresponding to several fields; For each set of source data forms, the first key field is determined and used to construct the first data, which serves as the first comparison data for the corresponding set of source data. For each first comparison data, all the values of the first key field are extracted to form a set, which serves as the first comparison set for the corresponding set of source data. The first key field is all the fields that participate in the first comparison in a pre-defined manner. The field values of all the first key fields in a single data in each set of source data forms are concatenated together to serve as the first key field value of the first comparison data. For two sets of source data, the two comparison sets are compared to obtain a balanced set and a questionable set. The balanced set includes the same first key field value in the comparison sets of the two sets of source data. The questionable set includes a set of multiple accounts and a set of missing accounts. For each set of source data forms, a second key field is determined and used to construct a second data set as the secondary comparison set of the corresponding set of source data. The second key field is all the fields that participate in the secondary comparison in a pre-defined manner. All the second key fields in a single data entry in each set of source data forms are concatenated together as the second key field value of the secondary comparison set. The second key field is included in the first key field but is not completely identical to the first key field. Extract the field values of the second key field from the suspicious set to form the secondary comparison key field values of the suspicious set; The key field values of the secondary comparison are compared with the secondary comparison set, and normal data forms and abnormal data forms are constructed based on the comparison results, including: The key field values of the secondary comparison of the multi-account set and the key field values of the secondary comparison of the missing account set are compared with the secondary comparison sets of the two sets of source data respectively to determine the mismatched key field values included in the multi-account set and the missing account set, and the mismatched key field values are deleted from the multi-account set and the missing account set to form a mismatch set. The system iterates through the sets of multiple accounts, missing accounts, balanced accounts, and mismatched accounts, finds the values corresponding to each key field value in a single comparison of the two sets of source data, and constructs corresponding multiple account forms, missing account forms, balanced account forms, and mismatched forms as data comparison results. The key field values and their corresponding values are stored in pairs. The normal data form includes the balanced account form, and the abnormal data form includes the multiple account form, the missing account form, and the mismatched form.
2. The method according to claim 1, characterized in that, The multi-account set includes a first key field value that is included in a comparison set of the first set of source data and not included in a comparison set of the second set of source data. The missing-account set includes a first key field value that is not included in a comparison set of the first set of source data and is included in a comparison set of the second set of source data.
3. The method according to claim 1, characterized in that, The key mismatch field values included in determining the multiple-account set and the missing-account set include: In the second set of source data in the two sets of source data, find the key field value that overlaps with the key field value of the second comparison of the multi-account set, and take the key field value in the multi-account set corresponding to the found key field value as the mismatched key field value; In the secondary comparison set of the first set of source data in the two sets of source data, find the key field value that overlaps with the key field value of the secondary comparison of the missing accounts set, and take the key field value in the missing accounts set corresponding to the found key field value as the mismatched key field value.
4. The method according to claim 1, characterized in that, The first data and the second data are constructed from two different data structures in the database.
5. An apparatus for performing data comparison, characterized in that, The device includes a data loading unit, a primary alignment set construction unit, a primary alignment unit, a secondary alignment set construction unit, and a secondary alignment unit; The data loading unit is used to write each of the two sets of source data to be compared into the first data structure of the database as a source data form; each set of source data includes several data entries, and each data entry includes field values corresponding to several fields; The first comparison set construction unit is used to determine and construct first data using the first key field for each group of source data forms, as the first comparison data of the corresponding group of source data; for each first comparison data, extract all the values of the first key field to form a set, as the first comparison set of the corresponding group of source data, where the first key field is all the fields that participate in the first comparison in a pre-defined manner, and the field values of all the first key fields in a single data in each group of source data forms are concatenated together as the first key field value of the first comparison data; The first comparison unit is used to compare two sets of source data to obtain a balanced set and a questionable set. The balanced set includes the same first key field value in the first comparison set of the two sets of source data. The questionable set includes a set of multiple accounts and a set of missing accounts. The secondary comparison set construction unit is used to determine and construct the second data using the second key field as the secondary comparison set of the corresponding group of source data. The second key field is all the fields that participate in the secondary comparison in a pre-set manner. All the second key fields in the single data in each group of source data form are concatenated together as the second key field value of the secondary comparison set. The second key field is included in the first key field and is not completely the same as the first key field. The secondary comparison unit is used to extract the field values of the second key field from the suspicious set, and to form the secondary comparison key field values of the suspicious set. The key field values of the secondary comparison are compared with the secondary comparison set, and normal data forms and abnormal data forms are constructed based on the comparison results, including: The key field values of the secondary comparison of the multi-account set and the key field values of the secondary comparison of the missing account set are compared with the secondary comparison sets of the two sets of source data respectively to determine the mismatched key field values included in the multi-account set and the missing account set, and the mismatched key field values are deleted from the multi-account set and the missing account set to form a mismatch set. The system iterates through the sets of multiple accounts, missing accounts, balanced accounts, and mismatched accounts, finds the values corresponding to each key field value in a single comparison of the two sets of source data, and constructs corresponding multiple account forms, missing account forms, balanced account forms, and mismatched forms as data comparison results. The key field values and their corresponding values are stored in pairs. The normal data form includes the balanced account form, and the abnormal data form includes the multiple account form, the missing account form, and the mismatched form.
6. A computer program product comprising a computer program / instructions, characterized in that, When executed by a processor, the computer program / instruction implements the data comparison method as described in any one of claims 1 to 4.
7. A computer-readable storage medium storing computer instructions thereon, characterized in that, When the instructions are executed by the processor, they implement the data comparison method as described in any one of claims 1 to 4.
8. An electronic device, characterized in that, The electronic device includes at least the computer-readable storage medium as described in claim 7, and further includes a processor; The processor is configured to read computer instructions from the computer-readable storage medium and execute the instructions to implement the data comparison method according to any one of claims 1 to 4.