Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Data scanning method and apparatus

A data scanning and data technology, applied in the field of data processing, can solve the problem of I/O resource waste

Active Publication Date: 2016-03-30
HUAWEI TECH CO LTD +1
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the existence of multiple versions, a large number of old versions of data will be read, resulting in a waste of I / O resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data scanning method and apparatus
  • Data scanning method and apparatus
  • Data scanning method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them.

[0056] The technical solution of the present application is mainly applied in a Key-Value (key-value) distributed storage system based on an LSM (Log-StructuredMergeTree, log structure merge) tree. LSM is an ordered non-locally updated data structure. In the Key-Value distributed storage system based on the LSM tree, Key-Value data is stored hierarchically.

[0057] LSM includes multiple levels. In the prior art, when the data size of a certain level exceeds the preset threshold, the data in a certain key range (KeyRange) in the level is the same as the key range in the next level. The data in the merge (compact) operation. Therefore, the data written into the LSM first is generall...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present application provide a data scanning method and apparatus. The method comprises: scanning Key-Value data stored at a first level of an LSM tree; for each storage block at each level excluding the first level: acquiring a Key set, wherein the Key set comprises all scanned Keys; acquiring a Key value range of the storage block; acquiring a Key intersection obtained according to the Key set and the Key value range; acquiring a scanning accuracy of the storage block according to the number of Keys in the Key set stored in a Bloom filter established for the storage block; when the scanning accuracy of the storage block is less than a preset scanning accuracy, scanning the Key-Value data stored in the storage block; and otherwise, skipping scanning the Key-Value data stored in the storage block. The data scanning method and apparatus provided by the embodiments of the present application reserve I / O resources and improve scanning performance.

Description

technical field [0001] The present application relates to the technical field of data processing, and more specifically relates to a data scanning method and device. Background technique [0002] In a Key-Value (key-key-value) distributed storage system, a commonly used data storage structure is an LSM (Log-StructuredMergeTree, log structure merge) tree. [0003] The LSM tree usually consists of a multi-level structure. Each pair of Key-Value data is first stored in the first level of the LSM tree. During the data storage process, if the data size of any level exceeds the preset threshold, the The data of is written to the next level and merged (compacted) with the key values ​​of the same Key range in the next level. [0004] It can be seen from the above description that Key-Value data is stored level by level in a merged manner, so there will be a large amount of Key-Value data with two or more versions. [0005] When performing a data reading operation, especially a da...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F3/06
Inventor 岳银亮张子刚潘锋烽刘扬宽
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products