Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data deduplication method and device based on MPP architecture database

A database and data technology, applied in the field of data processing, to achieve the effects of fast deduplication, low computing efficiency, and reduced workload

Pending Publication Date: 2022-05-27
度小满科技(北京)有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a data deduplication method, device, equipment and readable storage medium based on an MPP architecture database to solve the deduplication efficiency problem under multiple indicators

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data deduplication method and device based on MPP architecture database
  • Data deduplication method and device based on MPP architecture database
  • Data deduplication method and device based on MPP architecture database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The core of the present invention is to provide a data deduplication method based on the MPP schema database, which can solve the problem of deduplication efficiency under multiple indicators.

[0044] In order to enable those in the art to better understand the present invention, the present invention will be further elaborated in detail below in conjunction with the accompanying drawings and specific embodiments. Obviously, the embodiments described are only a portion of the embodiments of the present invention, and not all embodiments. Based on embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative work, are within the scope of protection of the present invention.

[0045] In the prior art, the SQL-based count distinct direct processing method is usually adopted, that is, after receiving the deduplication request, the count distinct operation is directly performed on the requested data. This meth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data deduplication method based on an MPP architecture database, and the method comprises the steps: determining to-be-deduplicated target data in original data according to statistical parameters, converting a multi-index deduplication operation into a plurality of single-index deduplication operations, and carrying out the deduplication processing of each deduplication index of the target data one by one. According to the method, the workload of single-index one-by-one de-duplication is remarkably reduced compared with that of traditional multi-index simultaneous de-duplication, the problems that the calculation efficiency is low and even operation cannot be carried out during direct count distribution can be effectively solved, meanwhile, resource occupation is small, and along with increase of the number of de-duplication indexes, the calculation efficiency is greatly improved. According to the method, the efficiency improvement effect is more remarkable, and multi-index rapid duplicate removal can be achieved. The invention further discloses a data deduplication device and equipment based on the MPP architecture database and a readable storage medium, which have corresponding technical effects.

Description

Technical field [0001] The present invention relates to the field of data processing technology, in particular to a data deduplication method based on an MPP architecture database, apparatus, apparatus and readable storage medium. Background [0002] Deduplication is a very important tool in daily statistical analysis, such as counting the number of transaction users in a day, the number of transaction users in a month, PV (Page View, visits), UV (Unique Visitor, unique visitors) and so on. Different from the common statistical methods for direct accumulation such as the number of transactions and amounts, the core idea of deduplication is to count the user's multiple operations within a specified time range on a one-time basis. [0003] In the prior art, the SQL-based count distinct (a deduplication function in SQL) is usually used to directly process the deduplication method, which can meet the needs of data deduplication to a certain extent, but is only suitable for scenarios ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/21
CPCG06F16/215G06F16/211
Inventor 李恒昌甘剑锋
Owner 度小满科技(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products