Method and device for deleting duplicated data and system

A technology of data deduplication and data page, which is applied in the database field, can solve the problems of wasting external memory storage space and reducing the efficiency of backing up current data in memory databases, and achieve the effect of saving storage space and improving efficiency

Active Publication Date: 2014-10-01
SHENZHEN INSTITUTE OF INFORMATION TECHNOLOGY
View PDF4 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiment of the present invention is to provide a method for deduplication of data, which aims to solve the problem that the existing in-memory database will back up the duplicate data in the current data to the external storage during the continuous data protection, resulting in a waste of the external storage for backing up the duplicate data The storage space reduces the efficiency of backing up the current data in the memory database, making it impossible to complete the backup of the current data in the memory database in a very short time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for deleting duplicated data and system
  • Method and device for deleting duplicated data and system
  • Method and device for deleting duplicated data and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] refer to figure 1 , figure 1 It is an implementation flowchart of a deduplication method provided by an embodiment of the present invention, and is described in detail as follows:

[0035] In step S101, the in-memory database is in the cache area, and the data pages to be written into the external memory are cached;

[0036] Wherein, the external storage includes but is not limited to a magnetic disk, a floppy disk, a hard disk or an optical disk.

[0037] Wherein, the data page to be written into the external memory is the data page of the current data backed up to the external memory.

[0038] Wherein the memory database is in the cache area, and the data pages to be written into the external memory are cached, specifically:

[0039] The in-memory database caches data pages to be written into the external memory in the cache area, and stores external data request events in a cache queue to stop processing external data request events.

[0040] Among them, the memo...

Embodiment 2

[0059] This embodiment mainly describes the implementation process if the fingerprint value does not exist in the preset fingerprint index table, and the details are as follows:

[0060] In the preset fingerprint index table, after searching whether the fingerprint value exists, it also includes:

[0061] If the fingerprint value does not exist in the preset fingerprint index table, it is judged that the data page is not a redundant page, and the data page is written into the data file;

[0062] Obtain the data page offset of the data page in the data file;

[0063] Writing the fingerprint value corresponding to the data page and the data page offset of the data page in the data file into the fingerprint index table, and writing the fingerprint value corresponding to the data page into the memory backup information file.

[0064] Wherein, if the fingerprint value does not exist in the preset fingerprint index table, it means that the fingerprint value corresponding to the dat...

Embodiment 3

[0067] This embodiment mainly describes the implementation process of restoring data pages in the memory database, which is described in detail as follows:

[0068] receive the selected backup point;

[0069] Read the backup information file according to the received backup point, and read the fingerprint values ​​one by one in the backup information file;

[0070] Each time a fingerprint value is read, the fingerprint value is used to search the data page offset corresponding to the fingerprint value in the fingerprint index table;

[0071] According to the data page offset and the data page size, read the data in the data page in the data file, and load the read data into the memory to restore the data in the memory database Page.

[0072] In this embodiment, the above process is repeated until the restoration is completed, and then the data in the memory database is restored to the data state of the backup point.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is suitable for the technical field of databases, and provides a method and a device for deleting duplicated data and a system. The method comprises the following steps: caching a data page to be written into an external storage by using a main memory database in a cache region; when the data page in a cache is written into the external storage, calling the data page; performing compressed mapping on the data in the called data page according to an SHA (Secure Hash Algorithm) to generate a fingerprint value corresponding to the data page; finding out whether the fingerprint value exists in a preset fingerprint index table or not; if the fingerprint value exists in the preset fingerprint index table, judging that the data page is a redundant page, not writing the data page into a data file in the external storage, and writing the fingerprint value corresponding to the data page into a backup information file in the external storage. The method, the device and the system have the beneficial effects that on one aspect, the storage space of the external storage is saved; on the other aspect, the backup efficiency of current data in the main memory database is increased.

Description

technical field [0001] The invention belongs to the technical field of databases, and in particular relates to a method, device and system for deduplicating data. Background technique [0002] In-memory database is a new type of database that puts the data in the database completely in memory. Since all operations are completed in memory, compared with traditional memory-based databases, in-memory database has incomparable huge advantages in terms of performance , very suitable for applications requiring extremely high performance. At the same time, the in-memory database automatically backs up the current data to the external storage on a regular basis through continuous data protection, so as to avoid the loss of data in the memory due to failures such as power failure. Since the memory database needs to provide extremely high access performance to the outside world, the current data backup of the memory database must be completed in a very short time. [0003] However, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/137G06F16/152G06F16/1748
Inventor 王寅峰
Owner SHENZHEN INSTITUTE OF INFORMATION TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products