Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Data deduplication method and device

A data and unified technology, applied in the field of data processing, can solve the problem of low efficiency when deduplicating multi-row or multi-column data, and achieve the effects of saving processing time, simple configuration, and improving efficiency

Inactive Publication Date: 2015-03-25
龙信数据(北京)有限公司
View PDF7 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a method and device for deduplication of data, and its purpose is to solve the problem of low efficiency when deduplication of multiple rows or columns of data in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data deduplication method and device
  • Data deduplication method and device
  • Data deduplication method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The core of the present invention is to provide a method and device for deduplication of data, which can be used in a data fusion system to deduplicate data with a data scale of more than tens of millions.

[0046] In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0047] A specific implementation of the data deduplication method provided by the present invention is as follows: figure 1 As shown, the method includes:

[0048] Step 101: Obtain the business pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data deduplication method and device. The method comprises the steps that service main keys of data to be processed are obtained, wherein the service main keys are fields representing data uniqueness according to service demands; the service main keys are converted into a unified preset format, and matching codes are generated; the generated matching codes are ranked according to a preset sequence so that verification codes can be generated; the verification codes after ranking are searched and compared with first verification codes arranged in front, and when the verification codes are identical with the first verification codes, distinguishing codes of the verification codes are marked as second distinguishing codes; the data marked as the second distinguishing codes in the verification codes are deleted. When multiple rows or multiple lines of data with the scale over a ten million level are processed, the data deduplication method is simple in configuration, convenient to use and high in operability, deduplication processing of multiple rows or multiple lines can be achieved at the same time, a great amount of processing time is saved, and efficiency of deuplication processing is improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for deduplication of data. Background technique [0002] The current data deduplication method based on the data fusion system is to sort a column or a row of data to be deduplicated, and identify the duplicate data with a distinguishing code (1, 2), and delete the data marked as "2". Existing data deduplication methods are inefficient when deduplicating multiple rows or columns of data. Contents of the invention [0003] The object of the present invention is to provide a method and device for deduplication of data, which aims to solve the problem of low efficiency in deduplication of multiple rows or columns of data in the prior art. [0004] In order to solve the above technical problems, the present invention provides a method for deduplication of data, including: [0005] Obtaining the business primary key of the data to be processed, the busi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/1748
Inventor 马欣顾喜德
Owner 龙信数据(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products