Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data identification method and data identification system

A data identification and data technology, applied in the field of data processing, can solve the problems of reducing deduplication efficiency, low deduplication efficiency and delaying the analysis process, etc.

Inactive Publication Date: 2012-07-18
GUANGZHOU SUNRISE ELECTRONICS DEV
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by this application is to provide a data identification method to solve the problem of obtaining all deduplication files and all deduplication data in the files in the prior art when identifying any data to be deduplicated. In the deduplication process, it is necessary to judge the deduplication data and all the deduplication data in the deduplication file, thereby reducing the deduplication efficiency, and further, causing the problem of delaying the analysis process due to the low deduplication efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data identification method and data identification system
  • Data identification method and data identification system
  • Data identification method and data identification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050]The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0051] The application can be used in numerous general purpose or special purpose computing system environments or configurations. For example: personal computer, server computer, handheld or portable device, tablet type device, multiprocessor system, distributed computing environment including any of the above systems or devices, etc.

[0052] This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data identification method and a data identification system. The data identification method comprises extracting a matched field from data to be duplicate removed, calculating a key field contained by the data to be duplicate removed and obtaining a hashed value of the key field; obtaining a duplicate removal file corresponding to the matched field; positioning duplicate-removed data in the duplicate removal file according to the hashed value; and judging whether the data to be duplicate removed and the duplicate-removed data are identical, and identifying the data to be duplicate removed to be duplicate data if the data to be duplicate removed and the duplicate-removed data are identical. Therefore, when every datum to be duplicate removed is identified, the duplicate-removed data in the duplicate removal file relative to the data to be duplicate removed can be obtained, thereby reducing quantity of obtained duplicate-removed data, namely reducing judging times and improving duplicate removal efficiency. Further, if a follow-up system needs to analyze the data in the duplicate removal file, the analyzing process is quickened due to the improvement of the duplicate removal efficiency.

Description

technical field [0001] This application relates to the field of data processing, in particular to a data identification method and system. Background technique [0002] Massive data processing is required in many industries such as banking, telecommunications, and the Internet. In massive data, data duplication is inevitable. How to delete duplicate data in massive data and keep only one of multiple identical data is an urgent problem to be solved. [0003] At present, the specific process of the data identification method may include: first, for any data to be deduplicated, obtain all deduplicated files and deduplicated data stored therein. Any de-duplicated data in the de-duplicated file is different from other de-duplicated data in the same file; secondly, traverse the obtained de-duplicated data, and determine whether there is any de-duplicated data in the de-duplicated data that is the same as the data to be de-duplicated Data, if yes, mark the data to be deduplicated...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 黄子维
Owner GUANGZHOU SUNRISE ELECTRONICS DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products