Data duplicate checking method and data duplicate checking device

A data and database technology, applied in the field of storage, can solve the problems of wasting system time, long computing time, time-consuming, etc., and achieve the effect of improving efficiency, high accuracy of duplication checking, and improving the efficiency of data duplication checking.

Pending Publication Date: 2020-07-28
CRRC INFORMATION TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, in the method described in "TP393.09F719.2 Research on Hotel Data Cleaning Method Based on Edit Distance and Conditional Function Dependence" in Wanfang database, the edit distance algorithm improves the recognition of string repetitions, but the edit distance algorithm generally takes less time to calculate. longer, more time-consuming
In the initial comparison, especially when the data set is relatively large, how to improve the speed of duplicate checking for large groups with millions of data records is lacking in this article.
The incremental matching algorithm described in this article mainly refers to the comparison between the new data set to be added and the original data set, because in the data management process, generally new data will involve the review of the management personnel of all parties when it enters the system. Each step in the process also needs to check the data. At this time, if each link checks the record and compares it with the original collection, it will waste a lot of system time.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data duplicate checking method and data duplicate checking device
  • Data duplicate checking method and data duplicate checking device
  • Data duplicate checking method and data duplicate checking device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention belong to the protection scope of the present invention.

[0048] For a large group, its data, such as material data, often has millions of records, and the auxiliary data associated with it has doubled, and the data has grown at a certain rate every year.

[0049]If the data is regarded as a data set such as C (C1, C2, C 3...Cm.), where C 1 to C m are each a piece of data, and each piece of data can include key-value pairs of multiple attributes / attribute values ​​( key / value), such as C1 (c11, c12, c13), this piece of data represents the first attr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data duplicate checking method and a data duplicate checking device. The data duplicate checking method is applied to a system comprising a source database, a client, a cacheand a result storage database. The method comprises: at a first moment, obtaining a duplicate checking request for data to be subjected to duplicate checking, the data to be subjected to duplicate checking having a unique identifier; for the to-be-duplicated data, judging whether a corresponding unique identifier exists in the result storage database or not; when judging that the unique identifier exists, obtaining a duplicate checking moment corresponding to the unique identifier; obtaining change data in a source database between a duplicate checking moment and a first moment, and performing duplicate checking comparison with the data to be subjected to duplicate checking; and storing a duplicate checking comparison result into the result storage database. The data duplicate checking efficiency can be improved, high duplicate checking accuracy can be guaranteed, and the method and the device can be suitable for first duplicate checking and subsequent multiple duplicate checking at the same time, that is, the efficiency of multiple duplicate checking in the first duplicate checking and business process can be improved.

Description

technical field [0001] The invention relates to the field of storage, in particular to a method and device for checking duplicate data. Background technique [0002] The master data of the enterprise is the data used to describe the core business entities of the enterprise, such as customers, partners, products, materials, etc.; it is data with high business value that can be reused across various business departments within the enterprise, and exists in multiple heterogeneous application systems. [0003] Due to the multi-source of data, the overlapping phenomenon of enterprise data in different systems is becoming more and more serious, and the identification and description of the same data in different systems are not uniform; even in the same system, with the continuous expansion of data scale, there are The same data is maintained as different instances, resulting in data redundancy, low data accuracy, and increased business error rates, which affect the management an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/2455
Inventor 李丽
Owner CRRC INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products