Data deduplication method, device, equipment and storage medium
A data and database technology, applied in the database field, can solve the problems of high maintenance cost and low data deduplication efficiency, and achieve the effect of improving efficiency and shortening time.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0029] figure 1 It is a flow chart of a data deduplication method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of data deduplication during file collection. The method can be executed by a data deduplication device, which can use software and / or hardware implementation. Specifically include the following steps:
[0030] S110. Obtain the data to be processed in the file to be processed, and calculate a first hash value and a first MD5 value of the data to be processed.
[0031] Wherein, exemplary, the file type of the file to be processed includes excel (electronic form), txt (text document) or csv (Comma Separate Values, comma separated value) and the like. The data format of the data to be processed is related to the file type of the file to be processed. Exemplarily, the data to be processed may be numbers, characters, symbols, etc., and the files to be processed and the data to be processed are not limited here.
[0032] ...
Embodiment 2
[0048] figure 2 It is a flow chart of a data deduplication method provided in Embodiment 2 of the present invention, and the technical solution of this embodiment is further refined on the basis of the foregoing embodiments. Optionally, the retrieved data also includes primary key data corresponding to each stored data, correspondingly, after obtaining the data to be processed in the file to be processed, it also includes: determining whether there is a pre-processed data in the data to be processed Main key data is set; if it exists, repeatability judgment is performed on the data to be processed based on the preset main key data.
[0049] S210. Obtain the data to be processed in the file to be processed.
[0050] S220. Determine that there is preset primary key data in the data to be processed; if yes, execute S230; if not, execute S240.
[0051] Wherein, the primary key data refers to at least one field data in the data to be processed, and the primary key data can uniqu...
Embodiment 3
[0068] Figure 4 It is a schematic diagram of a data deduplication device provided in Embodiment 3 of the present invention. This embodiment is applicable to the case of deduplication of data when collecting files, and the device can be realized by software and / or hardware. The data deduplication device includes a data to be processed acquisition module 310 , a target hash partition determination module 320 , a first MD5 value determination module 330 and a duplicate data determination module 340 .
[0069] Wherein, the data to be processed acquisition module 310 is used to obtain the data to be processed in the file to be processed, and calculate the first hash value and the first MD5 value of the data to be processed;
[0070] The target hash partition determining module 320 is configured to determine the target hash partition for data comparison in the stored retrieval data according to the first hash value, wherein the retrieval data includes at least one hash partition, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com