Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for sampling and verifying data table in data set

A data concentration and data table technology, applied in the computer field, can solve complex, time-consuming and human resource-intensive problems, and achieve the effect of ensuring accuracy

Active Publication Date: 2020-08-25
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of realizing the concept of the present disclosure, the inventors found that there are at least the following problems in the prior art: the amount of data that can be stored in a data lake of a large organization can reach 10PB (1PB=1024TB). It will be very complicated. It is necessary to obtain the full amount of data from each source business application system database, and then compare the data consistency with the data tables in the data lake one by one. Although this method can check all the data entering the lake, it takes a lot of time. and human resources, and the verified data table may not necessarily appear in the current business use, so using the full amount of data to perform data consistency verification will generate a lot of extra and unnecessary work

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for sampling and verifying data table in data set
  • Method and device for sampling and verifying data table in data set
  • Method and device for sampling and verifying data table in data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present disclosure.

[0030] The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the present disclosure. The terms "comprising", "comprising", etc. used herein indicate the presence of stated features, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for sampling and verifying a data table in a data set. The data set includes a plurality of data tables having a plurality of common indicators. The methodfor sampling the data tables in the data set comprises the steps of obtaining values of a plurality of common indexes for each data table in the plurality of data tables, and obtaining a group of values for each data table; determining a sampling probability of each data table in the plurality of data tables according to a predetermined model and a group of values for each data table; dividing the plurality of data tables according to the distribution of the sampling probabilities of the plurality of data tables to obtain at least two data table groups, each data table group comprising at least one data table; According to a preset rule, determining a sampling proportion for each data table group to sample each data table group, the sampling proportion being a ratio of the number of datatables extracted from each data table group to the total number of samples.

Description

technical field [0001] The present disclosure relates to the field of computer technology, and more specifically, to a method and device for sampling and verifying data tables in a data set. Background technique [0002] As the cache of the source business system database, the data lake uses the original format for data storage to avoid problems such as data inaccuracy or data structure distortion caused by processing or processing the original data. The importance of high-quality data in the data lake is self-evident, and the data consistency between the massive structured data in the data lake and the source business system is an important content of data quality measurement. [0003] In the process of realizing the concept of the present disclosure, the inventors found that there are at least the following problems in the prior art: the amount of data that can be stored in a data lake of a large organization can reach 10PB (1PB=1024TB). It will be very complicated. It is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/22
CPCG06F16/215G06F16/2282
Inventor 高炘张世瑛赵吉昆梁晔华
Owner INDUSTRIAL AND COMMERCIAL BANK OF CHINA