Method and device of data deduplication

A technology of data and target data, applied in the field of data statistics, to achieve the effect of improving comparison efficiency, improving data comparison efficiency, and reducing data volume

Inactive Publication Date: 2018-07-13
CHINA ACADEMY OF INFORMATION & COMM
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For other data deduplication applications, similar problems may also exist

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device of data deduplication
  • Method and device of data deduplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Embodiments of the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, so they are only examples, and should not be used to limit the protection scope of the present invention.

[0023] It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which the present invention belongs.

[0024] Such as figure 1 As shown, this embodiment provides a method for deduplication of data, including:

[0025] Step S1, constructing the longest common substring table according to the acquired target data.

[0026] Wherein, the target data is in a string format, and each target data refers to an object. For example, if the object is an application, the target data may be ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of data statistics, and particularly relates to a method and a device of data deduplication. The method of data deduplication of the invention includes: constructing a longest-common-sub-string table according to acquired target data; extracting a longest common sub-string of two pieces of data on which deduplication judgment needs to be carried out, and comparing the longest common sub-string with sub-strings in the longest-common-sub-string table; and carrying out deduplication processing on the two pieces of data if a sub-string which is the sameas the longest common sub-string does not exist in the longest-common-sub-string table. According to the method and the device of data deduplication of the invention, frequent updating of data in thetable is not needed, a data storage amount is decreased, and efficiency of data comparison in a deduplication process is improved.

Description

technical field [0001] The invention relates to the technical field of data statistics, in particular to a method and device for deduplication of data. Background technique [0002] The applications on the mobile application store may have duplicate problems, and there is a need for deduplication; or when performing data analysis on applications in different mobile application stores, it is also necessary to deduplicate the same application. [0003] There is a problem with multiple names for the same app. For example, the same application may adopt different names in different time periods, for example, video application software may add the name of the latest popular drama to the application name. For another example, the same app may use different names in different app stores, such as Tencent QQ and QQ. Of course, there are other situations that cause the same app to have different names. [0004] Different applications also have the problem of similar names (existing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 路博王跃方诗旭张育雄郭丽杨小燕刘艺
Owner CHINA ACADEMY OF INFORMATION & COMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products