A data deduplication method and device based on content awareness

A content-aware, data-based technology, applied in audio data retrieval, structured data retrieval, still image data retrieval, etc., can solve the problem of decreased deduplication success rate and achieve the effect of improving deduplication efficiency and deduplication success rate

Pending Publication Date: 2019-05-21
上海威固信息技术股份有限公司
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Therefore, using a hash-based data deduplication method will lead to a decrease in the success rate of deduplication

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data deduplication method and device based on content awareness
  • A data deduplication method and device based on content awareness
  • A data deduplication method and device based on content awareness

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to more clearly understand the purpose, technical solutions and advantages of the present invention, the present invention will be further described in detail below in conjunction with the examples and accompanying drawings. As a limitation of the present invention.

[0035] Such as figure 1 , various types of equipment are mounted on the communication network, and exchange data with the computer network data center through wired / wireless means. The data types uploaded by the client to the data center include text, audio, image, time-independent numerical data (called non-sequential data), and time-related numerical data (called time-series data). In the data center, the host converts the data to be backed up and stored into an IO write request and sends it to the storage array. The storage array controller responds to the IO write request sent by the host, and feeds back the operation result to the host. A storage array consists of a storage array controller ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data deduplication method and device based on content awareness, the device comprises an IO processing device running in an upper computer and a data deduplication device running in a storage controller, and the data deduplication device comprises an analysis unit, a feature value comparison unit and a data deduplication execution unit. The method includes: Dividing the to-be-stored data into texts according to contents, Audio, image, and non-timing numerical data and timing numerical data, calculating the characteristic values by using different algorithms according to types; combining the obtained characteristic value with the initial IO write request to form a new IO write request; and according to the analyzed IO write request data type, enabling the characteristic value comparison unit to read the characteristic value from the corresponding characteristic value sub-table, calculating the Hamming distance between the characteristic value of the to-be-storeddata and the read characteristic value, and sending a judgment result to the data deduplication execution unit to perform data deduplication operation. According to the invention, a feature value calculation algorithm based on content awareness and a data repetition judgment standard are adopted, so that the duplicate removal success rate and the duplicate removal efficiency are improved.

Description

technical field [0001] The invention belongs to the technical field of computer network data storage, and in particular relates to a deduplication processing method and device for network data. Background technique [0002] With the advancement of computer network technology and the development of mobile Internet technology, data has shown rapid growth in the past ten years, and the load of data centers is increasing day by day. The data management problems brought by the large number of end users challenge the efficient construction of data centers. There may be a lot of duplication in the duplication of text caused by reprinting and citation, the duplication of audio, image and video files caused by content sharing, and the data generated by devices with similar locations on the same network. Duplicate data occupies the interface bandwidth of the storage device and the space of the storage medium, wasting resources. Using technical means to identify duplicate data and eli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/20G06F16/30G06F16/50G06F16/60
Inventor 邱赐云周正吴佳
Owner 上海威固信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products