Data quality checking method and device, electronic equipment and readable storage medium

By performing data quality verification on the data table to be verified during the data processing cycle in the data processing layer, the problem of poor data quality control is solved, and the prediction and accuracy guarantee before data application is achieved.

CN117632928BActive Publication Date: 2026-06-12WEBANK (CHINA)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WEBANK (CHINA)
Filing Date
2023-11-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The poor data quality control in existing technologies makes it easy for data application to affect the accuracy of decision-making.

Method used

Within the data processing cycle of the data processing layer, data verification time points are set to perform data quality verification on the data tables to be verified. This includes obtaining data verification information, detecting data processing anomalies, verifying using daily fluctuations and day-on-day data volumes, and performing verification before data indicators are generated.

🎯Benefits of technology

By verifying data quality in advance at the data processing layer and predicting data anomalies, the impact of large-scale data applications is avoided, and the effectiveness of data quality control is improved.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117632928B_ABST
    Figure CN117632928B_ABST
Patent Text Reader

Abstract

The application discloses a data quality checking method and device, electronic equipment and a readable storage medium. The data quality checking method comprises the following steps: obtaining a to-be-checked data table at a data processing layer of a target data warehouse; determining a data checking time point of the to-be-checked data table according to the data processing layer, wherein the data checking time point is located in a data processing period of the data processing layer; and performing data quality checking on the to-be-checked data table at the data checking time point. The application solves the technical problem of poor management and control effect of data quality management and control.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data warehouse technology, and in particular to a data quality verification method, apparatus, electronic device, and readable storage medium. Background Technology

[0002] With the continuous development of technology, data warehouses have emerged to improve the interactivity of data. At the same time, as a new and important asset that Internet companies rely heavily on, the quality of data is directly related to its accuracy, making the control of data quality an indispensable part.

[0003] Currently, in the process of data quality control, data quality verification is only performed when the data in the data tables processed by the data warehouse is applied, and then the abnormal data found is repaired. However, since the application of data involves large-scale data, it is easy to cause situations that affect the accuracy of decision-making. Therefore, the current data quality control is ineffective. Summary of the Invention

[0004] The main objective of this application is to provide a data quality verification method, apparatus, electronic device, and readable storage medium, aiming to solve the technical problem of poor data quality control in the prior art.

[0005] To achieve the above objectives, this application provides a data quality verification method, the data quality verification method comprising:

[0006] Obtain the data table to be verified at the data processing layer of the target data warehouse;

[0007] Based on the data processing layer, the data verification time point of the data table to be verified is determined, wherein the data verification time point is located within the data processing cycle of the data processing layer;

[0008] Data quality verification is performed on the data table to be verified at the specified data verification time point.

[0009] Optionally, the verification time point is the first verification time point before the data processing layer performs data processing on the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0010] At the first verification time point, obtain the data verification information of the data table to be verified;

[0011] Based on the data verification information, the data table to be verified is subjected to data quality verification.

[0012] Optionally, the data verification time point is the second verification time point after the data processing layer processes the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0013] Based on the data identification information, detect whether there are any abnormalities in the data processing performed by the data processing layer on the data table to be verified;

[0014] If so, the verification result of the abnormal data quality of the data table to be verified is obtained;

[0015] If not, then perform data quality verification on the data table to be verified based on the data verification information.

[0016] Optionally, the data verification information includes daily fluctuation data volume and daily month-on-month data volume, and the step of performing data quality verification on the data table to be verified based on the data verification information includes:

[0017] Obtain the historical data table corresponding to the data table to be verified;

[0018] Based on the data volume relationship between the data table to be verified and the historical data table, the data table type is detected for the historical data table;

[0019] If the data table to be verified is detected to be a first type of data table, then the data quality of the data table to be verified is verified according to the daily fluctuation data volume.

[0020] If the data table to be verified is detected to be a second type of data table, then the data quality of the data table to be verified is verified based on the daily month-on-month data volume.

[0021] Optionally, the data verification time point is the third verification time point when the data processing layer generates the business data indicators corresponding to the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0022] At the third verification time point, the data indicator value of the business data indicator is obtained;

[0023] The data quality of the data table to be verified is verified based on the relationship between the data indicator value and the preset data indicator value.

[0024] Optionally, the data quality verification method further includes:

[0025] After all the core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, a basic data table set corresponding to each data mart is obtained, wherein the basic data table set includes at least one basic data table.

[0026] For any of the aforementioned basic data tables, a corresponding data quality verification rule is matched for the basic data table, and data quality verification is performed on the basic data table according to the data quality verification rule;

[0027] The data quality verification results of the basic data table set are displayed on the preset data quality control interface.

[0028] Optionally, the step of matching the corresponding data quality verification rules for the basic data table includes:

[0029] Based on the field identifiers of the basic data table, extract at least one core field from the basic data table;

[0030] Based on each core field and its corresponding field weight, a preset validation rule template is matched for the basic data table.

[0031] The core fields are inserted into the preset verification rule template to obtain the data quality verification rules.

[0032] To achieve the above objectives, this application also provides a data quality verification device, the data quality verification device comprising:

[0033] The acquisition module is used to acquire the data processing layer where the data table to be verified is located in the target data warehouse;

[0034] The determination module is used to determine the data verification information of the data table to be verified based on the data processing layer.

[0035] The verification module is used to perform data quality verification on the data table to be verified based on the data verification information.

[0036] Optionally, the data verification time point is the first verification time point before the data processing layer performs data processing on the data table to be verified, and the verification module is further used for:

[0037] At the first verification time point, obtain the data verification information of the data table to be verified;

[0038] Based on the data verification information, the data table to be verified is subjected to data quality verification.

[0039] Optionally, the data verification time point is the second verification time point after the data processing layer processes the data table to be verified, and the verification module is further used for:

[0040] At the second verification time point, acquire the data identification information and data verification information of the data table to be verified;

[0041] Based on the data identification information, detect whether there are any abnormalities in the data processing performed by the data processing layer on the data table to be verified;

[0042] If so, the verification result of the abnormal data quality of the data table to be verified is obtained;

[0043] If not, then perform data quality verification on the data table to be verified based on the data verification information.

[0044] Optionally, the data verification information includes daily fluctuation data volume and daily month-on-month data volume, and the verification module is further used for:

[0045] Obtain the historical data table corresponding to the data table to be verified;

[0046] Based on the data volume relationship between the data table to be verified and the historical data table, the data table type is detected for the historical data table;

[0047] If the data table to be verified is detected to be a first type of data table, then the data quality of the data table to be verified is verified according to the daily fluctuation data volume.

[0048] If the data table to be verified is detected to be a second type of data table, then the data quality of the data table to be verified is verified based on the daily month-on-month data volume.

[0049] Optionally, the data verification time point is the third verification time point when the data processing layer generates the business data indicators corresponding to the data table to be verified, and the verification module is further used for:

[0050] At the third verification time point, the data indicator value of the business data indicator is obtained;

[0051] The data quality of the data table to be verified is verified based on the relationship between the data indicator value and the preset data indicator value.

[0052] Optionally, the data quality verification device is further used for:

[0053] After all the core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, a basic data table set corresponding to each data mart is obtained, wherein the basic data table set includes at least one basic data table.

[0054] For any of the aforementioned basic data tables, a corresponding data quality verification rule is matched for the basic data table, and data quality verification is performed on the basic data table according to the data quality verification rule;

[0055] The data quality verification results of the basic data table set are displayed on the preset data quality control interface.

[0056] Optionally, the data quality verification device is further used for:

[0057] Based on the field identifiers of the basic data table, extract at least one core field from the basic data table;

[0058] Based on each core field and its corresponding field weight, a preset validation rule template is matched for the basic data table.

[0059] The core fields are inserted into the preset verification rule template to obtain the data quality verification rules.

[0060] This application also provides an electronic device, the electronic device comprising: at least one processor and a memory communicatively connected to the at least one processor, the memory storing instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the steps of the data quality verification method described above.

[0061] This application also provides a computer-readable storage medium storing a program implementing a data quality verification method, wherein when the program is executed by a processor, it implements the steps of the data quality verification method as described above.

[0062] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the data quality verification method described above.

[0063] This application provides a data quality verification method, apparatus, electronic device, and readable storage medium. Specifically, it involves obtaining a data table to be verified at the data processing layer of a target data warehouse; determining a data verification time point for the data table to be verified based on the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer; and performing data quality verification on the data table to be verified at the data verification time point.

[0064] When performing data quality verification on a data table to be verified, this application first directly obtains the data table to be verified from the data processing layer of the target data warehouse. Then, the data processing layer determines the data verification time point of the data table to be verified. Finally, the data quality verification of the data table to be verified is performed at the data verification time point that is within the data processing cycle of the data processing layer. Since the data verification time point is within the data processing cycle of the data processing layer, the data quality verification of the data table to be verified can be performed in the data processing layer. That is, the purpose of performing data quality verification on the data table to be verified in the target data warehouse is achieved.

[0065] Since the data verification time point occurs within the data processing cycle of the data processing layer, the data quality verification of the data in the data table to be verified can be achieved in the data processing layer. In other words, the purpose of verifying the data indicators of the target data warehouse before application is achieved, thereby enabling the prediction of whether there are data quality anomalies in the application data.

[0066] Based on this, this application performs data quality verification on the data table to be verified during the data processing cycle at the data processing layer, thereby determining the data quality status in the data table in advance. In other words, it achieves the goal of verifying data indicators for the target data warehouse before application, rather than discovering data quality anomalies only when the data table is used. Therefore, it overcomes the technical shortcomings of large-scale data applications, which can easily affect the accuracy of decision-making, thus improving the effectiveness of data quality control. Attached Figure Description

[0067] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0068] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0069] Figure 1 A flowchart illustrating the data quality verification method provided in Embodiment 1 of this application;

[0070] Figure 2 A flowchart illustrating the data quality verification method provided in Embodiment 2 of this application;

[0071] Figure 3 A schematic flowchart illustrating the data quality verification process of the data quality verification method provided in Embodiment 1 of this application;

[0072] Figure 4 This is a schematic diagram of the data quality verification device provided in Embodiment 3 of this application;

[0073] Figure 5 This is a schematic diagram of the structure of the electronic device provided in Embodiment 4 of this application.

[0074] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0075] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0076] Example 1

[0077] First, it should be understood that data warehousing is widely used across various industries for data processing. However, during the data processing process, data quality issues in data tables may arise due to the following reasons: 1) Logical changes occur in upstream data tables without timely notification to downstream data table developers; 2) Data table developers fail to consider all aspects during logical adjustments, leading to anomalies in batch processing; 3) Temporary anomalies occur during daily scheduled data output (e.g., upstream data tables are not ready, or a large data batch processing tool malfunctions). These data anomalies are typically only discovered when the data tables are applied, which can easily affect the accuracy of decision-making and increase data warehouse costs. Therefore, there is an urgent need for a method that can predict data quality to improve the effectiveness of data quality control.

[0078] This application provides a data quality verification method. In the first embodiment of the data quality verification method of this application, refer to... Figure 1 The data quality verification method includes:

[0079] Step S10: Obtain the data table to be verified from the data processing layer of the target data warehouse;

[0080] Step S20: Determine the data verification time point of the data table to be verified according to the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer;

[0081] Step S30: Perform data quality verification on the data table to be verified at the data verification time point.

[0082] In this embodiment, it should be noted that, although Figure 1 The logical order is shown, but in some cases, the steps shown or described may be performed in a different order than that shown here. Data quality verification methods are applied to data quality verification systems, which can specifically be database management systems. These systems can be deployed on electronic devices, such as computers or personal PCs. The target data warehouse characterizes the data warehouse used for data quality verification. The overall data processing in the data warehouse can be abstracted into inputs and outputs. The data processing layer characterizes the layering based on data processing methods, specifically the ODS (Operational Data Store), DWD (Data Warehouse Detail), DWM (Data Warehouse Middle), and DWS (Data Warehouse Data Storage) layers. The Service (data service layer), etc., are different data processing layers that perform corresponding data processing on the data tables to be verified. Among them, the data operation layer is used to aggregate source data from different business systems, the data detail layer is used to process the source data based on the data operation layer, the data middleware layer is used to perform light aggregation operations on common dimensions, and the data tables to be verified are used to represent the data tables waiting to be verified for data quality. Specifically, they can be core business data tables, full data tables, snapshot data tables, incremental data tables, chained data tables, dimension tables, entity tables, and fact tables, etc.

[0083] Additionally, it should be noted that the data verification time point is used to ensure the specific time for data quality verification, which can be down to the second. The data processing cycle refers to the entire time period from receiving data to processing the data and sending it out. For different types of data quality verification methods for different data processing layers, refer to the specific methods of existing technologies. For example, the data quality verification method can be a method that relies on SQL scripts for verification. This application embodiment will not be elaborated here.

[0084] As an example, steps S10 to S30 include: obtaining the data processing layer of the target data warehouse, and using the core business data table in the data processing layer as the data table to be verified; determining the data verification time point of the data table to be verified according to the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer; when the current time point is the data verification time point, performing data quality verification on the data table to be verified according to the preset SQL script corresponding to the data table to be verified.

[0085] This application embodiment sets different data verification time points for data at different data processing layers of the target data warehouse. Then, when the current time point is the corresponding data verification time point for the data processing layer, the corresponding data quality verification method is used to verify the data table to be verified. This achieves data quality verification of the data table to be verified while it is being processed at the data processing layer of the target data warehouse, rather than performing data quality verification only during the data usage stage. In other words, it achieves the goal of verifying data indicators of the target data warehouse before application, rather than discovering data quality anomalies only when the data table is used. Therefore, it overcomes the technical defect that large-scale data applications can easily affect the accuracy of decision-making, thus improving the control effect of data quality management.

[0086] The data verification time point is the first verification time point before the data processing layer processes the data table to be verified. The step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0087] Step A10: Obtain the data verification information of the data table to be verified at the first verification time point;

[0088] Step A20: Perform data quality verification on the data table to be verified based on the data verification information.

[0089] In this embodiment, it should be noted that since different data processing layers process data in different ways, and different data processing layers are located at different stages of the data flow of the data table to be verified in the target data warehouse, different data quality verification time points need to be set for different data processing layers. The data verification time point is the first verification time point before the data processing layer processes the data table to be verified. By performing data quality verification at the first verification time point, it is possible to effectively detect whether there are any anomalies in the source data input to the data processing layer, that is, to improve the targeting of data quality verification of the data table to be verified. For example, in one feasible approach, assuming the data processing layer is the ODS layer, performing data quality verification at the first verification time point can verify the quality of business data synchronized from the business system. The data verification information is used to characterize the parameters for judging data quality, which can specifically be data volume, data fluctuation, and data growth.

[0090] As an example, steps A10 to A20 include: acquiring the data volume of the data table to be verified at the first verification time point; and performing data quality verification on the data table to be verified based on the data volume. Since the first verification time point is the time point before the data processing layer processes the data table to be verified, performing data verification at this time point effectively verifies the data quality of the source data about to be input to the data processing layer, rather than after the data processing layer detects data quality anomalies, making it impossible to specifically analyze the cause of the data quality anomalies. Therefore, the targeting of data quality verification of the data table to be verified is improved.

[0091] The data verification time point is the second verification time point after the data processing layer processes the data table to be verified. The step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0092] Step B10: At the second verification time point, obtain the data identification information and data verification information of the data table to be verified;

[0093] Step B20: Based on the data identification information, detect whether there is any abnormality in the data processing performed by the data processing layer on the data table to be verified;

[0094] Step B30: If yes, then the verification result of the abnormal data quality of the data table to be verified is obtained.

[0095] Step B40: If not, perform data quality verification on the data table to be verified based on the data verification information.

[0096] In this embodiment, it should be noted that the input of the data processing layer can be judged by performing data quality verification at the first verification time point. However, the data processing layer may also cause data anomalies during the processing of the data table to be verified. Therefore, in order to effectively detect whether there are anomalies in the output data of the data processing layer, that is, to improve the targeting of the data quality verification of the data table to be verified, the data quality verification of the output data of the data processing layer can be performed by setting a second verification time point. The second verification time point is the time point after the data processing layer processes the data table to be verified. The data identification information is used to identify the identity of the data, specifically the primary key. For example, in one feasible approach, if there are multiple data records under the same primary key, it indicates that there are data anomalies in the data table to be verified during the processing. Or, if an anomaly is found when verifying the primary key of the data table to be verified, it indicates that there are data anomalies in the data table to be verified during the data processing. At the same time, since the data processing layer that performs data processing is equivalent to the middle layer of the target data warehouse, in addition to performing data quality verification on the data table to be verified during the data processing process of the data processing layer, it is also necessary to rely on the data verification information to perform data quality verification on the data table to be verified.

[0097] As an example, steps B10 to B40 include: acquiring the primary key and data verification information of the data table to be verified at the second verification time point; detecting whether there is an anomaly in the data processing performed on the data table to be verified by the data processing layer based on the primary key; if an anomaly is detected in the primary key, determining that there is an anomaly in the data processing performed on the data table to be verified by the data management layer, and verifying the verification result that the data quality of the data table to be verified is abnormal; if no anomaly is detected in the primary key, performing data quality verification on the data table to be verified using the data verification information.

[0098] The data verification information includes daily fluctuation data volume and daily month-on-month data volume. The step of performing data quality verification on the data table to be verified based on the data verification information includes:

[0099] Step C10: Obtain the historical data table corresponding to the data table to be verified;

[0100] Step C20: Based on the data volume relationship between the data table to be verified and the historical data table, perform data table type detection on the historical data table;

[0101] Step C30: If the data table to be verified is detected to be a first type of data table, then the data quality of the data table to be verified is verified according to the daily fluctuation data volume.

[0102] Step C40: If the data table to be verified is detected to be a second type of data table, then the data quality of the data table to be verified is verified according to the daily month-on-month data volume.

[0103] In this embodiment, it should be noted that collecting specific data for one-to-one comparison would reduce the efficiency of data quality verification. Therefore, in this embodiment, daily fluctuation data volume and daily month-on-month data volume are introduced as data verification information to verify the data table to be verified. Daily fluctuation data volume is used to characterize the daily change in data volume, and daily month-on-month data volume is used to characterize the ratio between the data volume in the current period and the data volume in the previous period. For different types of data tables, the changes in data volume have certain characteristics. For example, for a full data table, the data volume increases over time, while the data volume of an incremental data table is dynamically changing, meaning it may increase or decrease. There are several types of data tables. The first type can be an incremental data table, and the second type can be a full data table. The distinction between incremental and full data tables can be made manually or by using the table type label field of the data table to be verified. Since there is a difference in the magnitude of the data volume between incremental and full data tables, a historical data table can also be used for differentiation. The historical data table is used to represent the data collected within a single period. Specifically, it can be the data table of the data table to be verified from one day ago. For example, in one feasible approach, if the historical data table has 100 data points and the data table to be verified has 10,000 data points, then the data table to be verified is a full data table.

[0104] As an example, steps C10 to C40 include: obtaining the historical data table corresponding to the data table to be verified; performing a data table type detection on the historical data table by detecting the difference in data volume between the data table to be verified and the historical data table; if the data table to be verified is detected as an incremental data table, then performing a data quality detection on the incremental data table by determining whether the daily fluctuation data volume of the incremental data table exceeds a preset daily fluctuation data volume threshold; if the data table to be verified is detected as a full data table, then performing a data quality verification on the full data table by determining whether the daily month-on-month data volume exceeds a preset daily month-on-month data volume threshold.

[0105] The specific steps for performing data quality detection on the incremental data table by determining whether the daily fluctuation data volume of the incremental data table exceeds a preset daily fluctuation data volume threshold are as follows:

[0106] If it is determined that the daily fluctuation data value of the incremental data table exceeds the preset daily fluctuation data volume threshold, then the verification result of abnormal data quality of the incremental data table is obtained. If it is determined that the daily fluctuation data value of the incremental data table does not exceed the preset daily fluctuation data volume threshold, then the verification result of normal data quality of the incremental data table is obtained.

[0107] The specific steps for performing data quality verification on the full data table by determining whether the daily month-on-month data volume exceeds a preset daily month-on-month data volume threshold are as follows:

[0108] If the daily data volume is determined to be greater than 1, it indicates that the daily data table is in an increasing state, that is, the verification result of the full data table data quality is normal. If the daily data volume is determined to be less than or equal to 1, it indicates that the daily data table is not in an increasing state, that is, the verification result of the full data table data quality is abnormal.

[0109] The data verification time point is the third verification time point at which the data processing layer generates the business data indicators corresponding to the data table to be verified. The step of performing data quality verification on the data table to be verified at the data verification time point includes:

[0110] Step D10: Obtain the data indicator value of the business data indicator at the third verification time point;

[0111] Step D20: Perform data quality verification on the data table to be verified based on the relationship between the data indicator value and the preset data indicator value.

[0112] In this embodiment, it should be noted that before the target data warehouse outputs business data indicators, the fluctuation values ​​of the business data indicators at different time periods can be monitored to confirm whether they are within the normal fluctuation range. This allows for the verification of whether the data table to be used has abnormal data quality. In other words, the third verification time point is the time point at which the data processing layer generates the business data indicators. By performing data verification at the third verification time point, the data quality of the data table to be used can be effectively verified. Specifically, the data indicator values ​​can be order rate, order completion rate, and order return rate, etc.

[0113] As an example, steps D10 to D20 include: obtaining the data indicator value of the business data indicator at the third verification time point; if the data indicator value is greater than the preset data indicator value, then verifying the data quality of the data table to be verified is abnormal; if the data indicator value is less than or equal to the preset data indicator value, then verifying the data quality of the data table to be verified is normal.

[0114] This application provides a data quality verification method, which involves obtaining a data table to be verified at the data processing layer of a target data warehouse; determining a data verification time point for the data table to be verified based on the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer; and performing data quality verification on the data table to be verified at the data verification time point.

[0115] In this embodiment of the application, when performing data quality verification on a data table to be verified, the data table to be verified is first directly obtained from the data processing layer of the target data warehouse. Then, the data verification time point of the data table to be verified is determined through the data processing layer. Finally, the data quality verification of the data table to be verified is performed at the data verification time point that is within the data processing cycle of the data processing layer. Since the data verification time point is within the data processing cycle of the data processing layer, the data quality verification of the data table to be verified can be performed in the data processing layer. That is, the purpose of performing data quality verification on the data table to be verified in the target data warehouse is achieved.

[0116] Since the data verification time point occurs within the data processing cycle of the data processing layer, the data quality verification of the data in the data table to be verified can be achieved in the data processing layer. In other words, the purpose of verifying the data indicators of the target data warehouse before application is achieved, thereby enabling the prediction of whether there are data quality anomalies in the application data.

[0117] Based on this, the embodiments of this application perform data quality verification on the data table to be verified during the data processing cycle of the data processing layer, thereby determining the data quality status of the data table in advance. In other words, this achieves the goal of verifying data indicators for the target data warehouse before application, rather than discovering data quality anomalies only when the data table is used. Therefore, it overcomes the technical shortcomings of large-scale data applications, which can easily affect the accuracy of decision-making, thus improving the effectiveness of data quality control.

[0118] Example 2

[0119] Furthermore, referring to Figure 2 In another embodiment of this application, content that is the same as or similar to that in Embodiment 1 described above can be referred to the above description and will not be repeated hereafter. Based on this, the data quality verification method further includes:

[0120] Step E10: After the core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, obtain the basic data table set that is common to each data mart, wherein the basic data table set includes at least one basic data table.

[0121] Step E20: For any of the basic data tables, match the corresponding data quality verification rules for the basic data tables, and perform data quality verification on the basic data tables according to the data quality verification rules;

[0122] Step E30: Display the data quality verification results of the basic data table set on the preset data quality control interface.

[0123] In this embodiment, it should be noted that due to the large number of tables in the target data warehouse, setting corresponding data quality verification rules for each table in different data processing layers would result in a very large workload for data quality verification. Therefore, for some basic data tables, a large-scale data quality batch monitoring tool can be used. That is, data quality verification can be set for a batch of basic data tables under the data mart through batch text commands (each data quality verification rule corresponds to one command). The data quality verification rules can be set by the operation and maintenance personnel using big data tools based on actual needs. For example, in one feasible approach, the data quality verification rules can be set using an Excel tool. The data quality verification rules include rule number, verification template, database name, table name, primary key, verification type, table row count threshold, filtering conditions, and alarm recipients. The preset data quality control interface is the visual interface of the data quality verification system. The core data table and the basic data table can be distinguished by whether the data in the table is business data.

[0124] As an example, steps E10 to E30 include: after all core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, obtaining a set of basic data tables common to all the data marts, wherein the set of basic data tables includes at least one basic data table; for any basic data table, using the primary key of the basic data table as an index, querying the corresponding data quality verification rule in a preset data quality verification rule table, and performing data quality verification on the basic data table according to the data quality verification rule; after all basic data tables in the set of basic data tables have been verified, displaying the data quality verification result of the basic dataset on a preset data quality control interface.

[0125] The step of matching the corresponding data quality verification rules for the basic data table includes:

[0126] Step F10: Extract at least one core field from the basic data table based on the field identifier of the basic data table;

[0127] Step F20: Match a preset validation rule template to the basic data table based on each core field and its corresponding field weight;

[0128] Step F30: Insert each of the core fields into the preset verification rule template to obtain the data quality verification rules.

[0129] In this embodiment, it should be noted that basic data tables of the same type differ only in some fields, but their data quality verification logic is common. Therefore, the basic data quality verification logic can be reused by changing the fields in the data quality verification rules. Specifically, the core fields can be fields related to business. When there are multiple fields related to business, the template bias of the basic data table can be determined by different field weights. The template bias is used to characterize the degree to which the basic data table tends to follow the preset verification rule template. Thus, after inserting the core fields into the preset verification rule template, the data quality verification rules of the basic data table can be obtained.

[0130] As an example, steps F10 to F30 include: extracting at least one core field from the basic data table based on the field identifier of the basic data table; calculating the template bias of the basic data table using each core field and its corresponding field weight; querying the preset verification rule template corresponding to the basic data table using the template bias as an index; and inserting each core field into the preset verification rule template to obtain the data quality verification rule.

[0131] It should be noted that data verification time points can be set at different data processing layers for data verification. Furthermore, when a data processing layer obtains a verification result indicating abnormal data quality in the table to be verified, it can provide feedback on the data quality verification status by executing specific alarm operations. For example, refer to... Figure 3 , Figure 3 To illustrate the data quality verification process, data quality monitoring measures are set up for different data processing layers and different data tables. When a data quality anomaly occurs in a certain data management layer or a certain type of data table (core data table or basic data table), an alarm is triggered to the operation and maintenance personnel by telephone.

[0132] This application provides a method for displaying data quality verification results. Specifically, after all core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, a set of basic data tables common to all the data marts is obtained. This set of basic data tables includes at least one basic data table. For any basic data table, a corresponding data quality verification rule is matched, and data quality verification is performed on the basic data table according to the data quality verification rule. The data quality verification results of the basic data table set are displayed on a preset data quality control interface. This application, when dealing with large-scale basic data tables under a data mart, pre-sets data quality verification rules for different basic data tables using big data tools. During the data quality verification process, the basic data tables can be verified using batch text commands, and the data quality verification results are finally displayed on the preset data quality control interface, thereby improving the efficiency of data quality verification under the data mart.

[0133] Example 3

[0134] This application also provides a data quality verification device, referring to... Figure 4 The data quality verification device includes:

[0135] The acquisition module 101 is used to acquire the data processing layer of the data table to be verified in the target data warehouse;

[0136] The determining module 102 is used to determine the data verification information of the data table to be verified based on the data processing layer.

[0137] The verification module 103 is used to perform data quality verification on the data table to be verified based on the data verification information.

[0138] Optionally, the data verification time point is the first verification time point before the data processing layer performs data processing on the data table to be verified, and the verification module 103 is further used for:

[0139] At the first verification time point, obtain the data verification information of the data table to be verified;

[0140] Based on the data verification information, the data table to be verified is subjected to data quality verification.

[0141] Optionally, the data verification time point is the second verification time point after the data processing layer processes the data table to be verified, and the verification module 103 is further used for:

[0142] At the second verification time point, acquire the data identification information and data verification information of the data table to be verified;

[0143] Based on the data identification information, detect whether there are any abnormalities in the data processing performed by the data processing layer on the data table to be verified;

[0144] If so, the verification result of the abnormal data quality of the data table to be verified is obtained;

[0145] If not, then perform data quality verification on the data table to be verified based on the data verification information.

[0146] Optionally, the data verification information includes daily fluctuation data volume and daily month-on-month data volume, and the verification module 103 is further used for:

[0147] Obtain the historical data table corresponding to the data table to be verified;

[0148] Based on the data volume relationship between the data table to be verified and the historical data table, the data table type is detected for the historical data table;

[0149] If the data table to be verified is detected to be a first type of data table, then the data quality of the data table to be verified is verified according to the daily fluctuation data volume.

[0150] If the data table to be verified is detected to be a second type of data table, then the data quality of the data table to be verified is verified based on the daily month-on-month data volume.

[0151] Optionally, the data verification time point is the third verification time point when the data processing layer generates the business data indicators corresponding to the data table to be verified, and the verification module 103 is further used for:

[0152] At the third verification time point, the data indicator value of the business data indicator is obtained;

[0153] The data quality of the data table to be verified is verified based on the relationship between the data indicator value and the preset data indicator value.

[0154] Optionally, the data quality verification device is further used for:

[0155] After all the core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, a basic data table set corresponding to each data mart is obtained, wherein the basic data table set includes at least one basic data table.

[0156] For any of the aforementioned basic data tables, a corresponding data quality verification rule is matched for the basic data table, and data quality verification is performed on the basic data table according to the data quality verification rule;

[0157] The data quality verification results of the basic data table set are displayed on the preset data quality control interface.

[0158] Optionally, the data quality verification device is further used for:

[0159] Based on the field identifiers of the basic data table, extract at least one core field from the basic data table;

[0160] Based on each core field and its corresponding field weight, a preset validation rule template is matched for the basic data table.

[0161] The core fields are inserted into the preset verification rule template to obtain the data quality verification rules.

[0162] The data quality verification device provided by this invention, employing the data quality verification method described in the above embodiments, solves the technical problem of poor data quality control effectiveness. Compared with the prior art, the beneficial effects of the data quality verification device provided by this invention are the same as those of the data quality verification method described in the above embodiments, and other technical features of this data quality verification device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0163] Example 4

[0164] This invention provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the data quality verification method in Embodiment 1 above.

[0165] The following is for reference. Figure 5 The diagram illustrates a structural schematic of an electronic device suitable for implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 5 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0166] like Figure 5As shown, the electronic device may include a processing unit 1001 (e.g., a central processing unit, a graphics processor, etc.) that can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the electronic device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus.

[0167] Typically, the following systems can be connected to I / O interface 1006: input devices 1007 including, for example, touchscreens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. The communication devices allow electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although electronic devices with various systems are shown in the figures, it should be understood that it is not required to implement or possess all the systems shown. More or fewer systems may be implemented alternatively.

[0168] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 1009, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of embodiments of this disclosure.

[0169] The electronic device provided by this invention, employing the data quality verification method described in the above embodiments, solves the technical problem of poor data quality control effectiveness. Compared with the prior art, the beneficial effects of the electronic device provided by this invention are the same as those of the data quality verification method described in the above embodiments, and other technical features of this electronic device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0170] It should be understood that various parts of this disclosure can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics may be combined in any suitable manner in one or more embodiments or examples.

[0171] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

[0172] Example 5

[0173] This embodiment provides a computer-readable storage medium having computer-readable program instructions stored thereon, which are used to execute the data quality verification method described in the above embodiment.

[0174] The computer-readable storage medium provided in this embodiment of the invention may be, for example, a USB flash drive, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination thereof.

[0175] The aforementioned computer-readable storage medium may be included in an electronic device or may exist independently without being assembled into an electronic device.

[0176] The aforementioned computer-readable storage medium carries one or more programs that, when executed by an electronic device, cause the electronic device to: acquire a data table to be verified at the data processing layer of the target data warehouse; determine a data verification time point for the data table to be verified based on the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer; and perform data quality verification on the data table to be verified at the data verification time point.

[0177] Computer program code for performing the operations of this disclosure can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0178] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0179] The modules described in the embodiments of this disclosure can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0180] The computer-readable storage medium provided by this invention stores computer-readable program instructions for executing the above-described data quality verification method, thus solving the technical problem of poor data quality control effectiveness. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this invention are the same as those of the data quality verification method provided in the above-described embodiments, and will not be repeated here.

[0181] Example 6

[0182] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the data quality verification method described above.

[0183] The computer program product provided in this application solves the technical problem of poor data quality control. Compared with the prior art, the beneficial effects of the computer program product provided in this embodiment are the same as the beneficial effects of the data quality verification method provided in the above embodiments, and will not be repeated here.

[0184] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent scope of this application.

Claims

1. A data quality verification method, characterized in that, The data quality verification method includes: Obtain the data table to be verified at the data processing layer of the target data warehouse; Based on the data processing layer, the data verification time point of the data table to be verified is determined, wherein the data verification time point is located within the data processing cycle of the data processing layer; At the specified data verification time point, the data table to be verified is subjected to data quality verification. The data verification time point is the first verification time point before the data processing layer processes the data table to be verified. The step of performing data quality verification on the data table to be verified at the data verification time point includes: At the first verification time point, obtain the data verification information of the data table to be verified; Based on the data verification information, the data table to be verified is subjected to data quality verification. Furthermore, the data verification time point is the second verification time point after the data processing layer processes the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes: At the second verification time point, acquire the data identification information and data verification information of the data table to be verified; Based on the data identification information, detect whether there are any abnormalities in the data processing performed by the data processing layer on the data table to be verified; If so, the verification result of the abnormal data quality of the data table to be verified is obtained; If not, then perform data quality verification on the data table to be verified based on the data verification information. Furthermore, the data verification time point is the third verification time point at which the data processing layer generates the business data indicators corresponding to the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes: At the third verification time point, the data indicator value of the business data indicator is obtained; The data quality of the data table to be verified is verified based on the relationship between the data indicator value and the preset data indicator value.

2. The data quality verification method as described in claim 1, characterized in that, The data verification information includes daily fluctuation data volume and daily month-on-month data volume. The step of performing data quality verification on the data table to be verified based on the data verification information includes: Obtain the historical data table corresponding to the data table to be verified; Based on the data volume relationship between the data table to be verified and the historical data table, the data table type is detected for the historical data table; If the data table to be verified is detected to be a first type of data table, then the data quality of the data table to be verified is verified according to the daily fluctuation data volume. If the data table to be verified is detected to be a second type of data table, then the data quality of the data table to be verified is verified based on the daily month-on-month data volume.

3. The data quality verification method as described in claim 1, characterized in that, The data quality verification method further includes: After all the core data tables under at least one data mart corresponding to the target data warehouse have completed data processing, a basic data table set corresponding to each data mart is obtained, wherein the basic data table set includes at least one basic data table. For any of the aforementioned basic data tables, a corresponding data quality verification rule is matched for the basic data table, and data quality verification is performed on the basic data table according to the data quality verification rule; The data quality verification results of the basic data table set are displayed on the preset data quality control interface.

4. The data quality verification method as described in claim 3, characterized in that, The steps for matching the corresponding data quality verification rules to the basic data table include: Based on the field identifiers of the basic data table, extract at least one core field from the basic data table; Based on each core field and its corresponding field weight, a preset validation rule template is matched for the basic data table. The core fields are inserted into the preset verification rule template to obtain the data quality verification rules.

5. A data quality verification device, characterized in that, The data quality verification device includes: The acquisition module is used to obtain the data processing layer of the data table to be verified in the target data warehouse; The determination module is used to determine the data verification time point of the data table to be verified based on the data processing layer, wherein the data verification time point is located within the data processing cycle of the data processing layer. The verification module is used to perform data quality verification on the data table to be verified at the data verification time point. The data verification time point is the first verification time point before the data processing layer processes the data table to be verified. The step of performing data quality verification on the data table to be verified at the data verification time point includes: At the first verification time point, obtain the data verification information of the data table to be verified; Based on the data verification information, the data table to be verified is subjected to data quality verification. Furthermore, the data verification time point is the second verification time point after the data processing layer processes the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes: At the second verification time point, acquire the data identification information and data verification information of the data table to be verified; Based on the data identification information, detect whether there are any abnormalities in the data processing performed by the data processing layer on the data table to be verified; If so, the verification result of the abnormal data quality of the data table to be verified is obtained; If not, then perform data quality verification on the data table to be verified based on the data verification information. Furthermore, the data verification time point is the third verification time point at which the data processing layer generates the business data indicators corresponding to the data table to be verified, and the step of performing data quality verification on the data table to be verified at the data verification time point includes: At the third verification time point, the data indicator value of the business data indicator is obtained; The data quality of the data table to be verified is verified based on the relationship between the data indicator value and the preset data indicator value.

6. An electronic device, characterized in that, The electronic device includes: At least one processor; A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the steps of the data quality verification method according to any one of claims 1 to 4.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program that implements a data quality verification method, the program being executed by a processor to implement the steps of the data quality verification method as described in any one of claims 1 to 4.