Data deduplication method and device

A technology of data and data volume, applied in the Internet field, can solve problems such as low efficiency

Inactive Publication Date: 2019-07-30
BEIJING GRIDSUM TECH CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a data deduplication method and device, to at least solve the technical problem of low efficiency when matching duplicate data for files with a large amount of data in the related art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data deduplication method and device
  • Data deduplication method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0020] It should be noted that the terms "first" and "second" in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under ap...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data deduplication method and a data deduplication device. The method comprises the steps of sorting the multiple pieces of data in a first file and a second file according to the preset sorting conditions, wherein the corresponding pointers are arranged in the first file and the second file, and the pointers are used for indicating the sorting bits of the rows where thedata in the files are located; judging whether the first character string data and the second character string data are the same or not according to a sorting result; if it is judged that the first character string data are the same as the second character string data, recording the position information of the same character string data in the corresponding file; and performing duplicate removal processing on the same character string data in the first file and the second file according to the recorded position information. According to the method and the device, the technical problem that theefficiency is relatively lower when the repeated data is matched for files with relatively larger data volumes in related technologies is solved.

Description

Technical field [0001] The present invention relates to the field of Internet technology, in particular, to a data deduplication method and device. Background technique [0002] In related technologies, in the data collection and processing links of web pages, it is often necessary to filter out duplicate data, so as to obtain clean and non-duplicated data for subsequent file processing. In the current data filtering technology, generally Each piece of data in a single file is matched with each piece of data in another file, so as to query whether the data has duplicate data in another file, or use a special filter to analyze common data in the data Match to get duplicate data. However, in the above method, for files with a large amount of data, if each piece of data is matched with the data in another file, the speed will be very slow and the efficiency is very low. However, for the special filter matching method, Its accuracy will decrease, and abnormal data in the file will ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/174G06F16/903
CPCG06F16/1748G06F16/90344
Inventor 刘凯
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products