Data identification method and data identification system

A data identification and data technology, applied in the field of data processing, can solve the problems of reducing deduplication efficiency, low deduplication efficiency and delaying the analysis process, etc.

Inactive Publication Date: 2012-07-18
GUANGZHOU SUNRISE ELECTRONICS DEV
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by this application is to provide a data identification method to solve the problem of obtaining all deduplication files and all deduplication data in the files in the prior art when identifying any data to be deduplicated. In the ded

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data identification method and data identification system
  • Data identification method and data identification system
  • Data identification method and data identification system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0050] The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

[0051] This application can be used in many general or special computing system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multi-processor systems, distributed computing environments including any of the above systems or devices, and so on.

[0052] This application may be described in the general context of computer-executable instructions executed by a computer, such a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data identification method and a data identification system. The data identification method comprises extracting a matched field from data to be duplicate removed, calculating a key field contained by the data to be duplicate removed and obtaining a hashed value of the key field; obtaining a duplicate removal file corresponding to the matched field; positioning duplicate-removed data in the duplicate removal file according to the hashed value; and judging whether the data to be duplicate removed and the duplicate-removed data are identical, and identifying the data to be duplicate removed to be duplicate data if the data to be duplicate removed and the duplicate-removed data are identical. Therefore, when every datum to be duplicate removed is identified, the duplicate-removed data in the duplicate removal file relative to the data to be duplicate removed can be obtained, thereby reducing quantity of obtained duplicate-removed data, namely reducing judging times and improving duplicate removal efficiency. Further, if a follow-up system needs to analyze the data in the duplicate removal file, the analyzing process is quickened due to the improvement of the duplicate removal efficiency.

Description

technical field [0001] This application relates to the field of data processing, in particular to a data identification method and system. Background technique [0002] Massive data processing is required in many industries such as banking, telecommunications, and the Internet. In massive data, data duplication is inevitable. How to delete duplicate data in massive data and keep only one of multiple identical data is an urgent problem to be solved. [0003] At present, the specific process of the data identification method may include: first, for any data to be deduplicated, obtain all deduplicated files and deduplicated data stored therein. Any de-duplicated data in the de-duplicated file is different from other de-duplicated data in the same file; secondly, traverse the obtained de-duplicated data, and determine whether there is any de-duplicated data in the de-duplicated data that is the same as the data to be de-duplicated Data, if yes, mark the data to be deduplicated...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 黄子维
Owner GUANGZHOU SUNRISE ELECTRONICS DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products