Duplication eliminating method based on multidimensional lattice data spatial model

A technology of lattice data and spatial model, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as reduced efficiency, slow retrieval speed, and increased resource consumption, and achieve serious resource consumption and deduplication high efficiency effect

Inactive Publication Date: 2012-10-03
苏州云端信息科技有限公司
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the speed of the URL retrieval method is slow, it will form a bottleneck, seriously affecting the collection speed and scalability of the entire system
In the ever-increasing massive data, the retrieval will become slower and slower due to the increasing amount of data, and the system resource consumption will be more. Especially in the use of algorithms such as HASH, the growth of data volume will make the data loss Collision (data duplication) intensifies, resulting in increased data bit error rate
The length of data is a factor that cannot be ignored in efficiency and resource consumption. If every piece of data that needs to be deduplicated is original data, it will be fatal to the deduplication method, because there is no way to predict the length of the data.
At this time, it is necessary to convert the data into a unified data form, such as HASH. Many deduplication methods use HASH, but because of the high collision (data duplication) rate, they will use a variety of different HASH algorithm combinations. The pen data is processed into multiples by HASH for de-duplication filtering, so that the collision (data duplication) rate is reduced, but the efficiency is also reduced, and the resource consumption also shows a corresponding geometric increase, and at the same time, the de-duplication cache Saving and loading data increases the difficulty

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplication eliminating method based on multidimensional lattice data spatial model
  • Duplication eliminating method based on multidimensional lattice data spatial model
  • Duplication eliminating method based on multidimensional lattice data spatial model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The object of the present invention is achieved like this, comprising steps:

[0024] ①Format the original data into a data form that conforms to the multidimensional lattice data space model. The formatted data format is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C , D, E, F these numbers, the binary length of each data point is limited to a multiple of 8. Because the original data is a string composed of various symbols, but the length of the string is unlikely to be the same, there is a big difference, using a database or cache to perform matching, filtering, and deduplication will inevitably cause more and more efficiency as the data continues to increase. Low, and more and more resources are consumed. The length of the data string is very important in efficiency and resource consumption. If every data that needs to be deduplicated is original data, it will be fatal to the deduplication method. , at this moment just need to convert data into unified form, the present inve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a duplication eliminating method based on a multidimensional lattice data spatial model. The method includes following steps: loading local cache data and building the multidimensional lattice data spatial model; transferring the data into customized data format and cutting the data into data points; searching the data one by one, positioning coordinates of each data point on dimensions corresponding to the data model, searching each data point from a first digit down digit by digit, characterizing each data point if the same does not exist in the data model, and marking the data as absence; and traversing the data points of the data, outputting the cache if the data is marked as absence, and searching next data until all the data is searched. The duplication eliminating method based on the multidimensional lattice data spatial model is suitable for filtration and duplication eliminating of various data, high in duplication eliminating efficiency and has fine application value in engineering. In addition, by the method, the problem of severe resource consumption caused by length difference of the data is solved.

Description

[technical field] [0001] The invention relates to data deduplication, in particular to a deduplication method based on a multidimensional lattice data space model. [Background technique] [0002] There are many deduplication technologies. The most commonly used deduplication technologies such as hash and Bloom process the content that needs to be deduplicated and then match them one by one. This collection processing method is feasible, but it is cumbersome in massive amounts of data, and when the quantity is to a certain extent, the data repetition rate will increase rapidly, so data deduplication is meaningless. In the process of deduplication, a series of issues such as how to save deduplicated data and cache loading also need to be considered. If the deduplicated data cannot be saved and cached, restarting the deduplication server will start a new deduplication work, and it will be a repeated work for the processed data, which will reduce the accuracy of the data virtua...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘威庄敬伟
Owner 苏州云端信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products