Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data storage method for massive unstructured data

A technology for unstructured data and data storage, which is used in electrical digital data processing, special data processing applications, instruments, etc. It can solve the problem of unpredictable sources, total data size, large metadata scale, and inability to guarantee large files, etc. problems, to improve retrieval efficiency, reduce complexity, and reduce data merging and migration.

Active Publication Date: 2018-05-29
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Since many small files are actually stored on the disk by converging to form a larger file, this can avoid the problem of large metadata caused by many small files, and can also effectively reduce the size of the disk. The time consumption of addressing, and the cache technology can be used for data pre-storage, but the technical implementation difficulty of the above scenario is mainly due to the source of data (data: a single small file), the size of the data, and the total amount of data in a certain period of time. Unpredictable, it is impossible to use a fixed mode to set data storage rules, that is, it is impossible to guarantee that the large file of data collection can be effectively controlled within a certain range, because if the collection of files is too large, it will be difficult to load, and if it is too small, it is useless

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data storage method for massive unstructured data
  • A data storage method for massive unstructured data
  • A data storage method for massive unstructured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the purpose, technical solution and advantages of the present invention clearer, the hierarchical and segmented backup data organization and management method according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0057] figure 1 A schematic diagram of the data model that supports the time interval and hash two-level division method is given. Under this data model, the metadata information of the original file mainly includes the file name, the number of records in the file, the division rule to which the file belongs, the node where the file is located, and the file The disk, file creation time and other information, the metadata of the index file includes information such as maximum value, minimum value, total number of records, number of non-duplicate records, etc. Through the setting of the above elements, it effectively provides various information needed in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data storage method for mass unstructured data. The method comprises the steps that firstly, a time interval T is set, a data storage cluster is divided into a plurality of zones, records obtained at the same time interval T are stored in the same zone, and meanwhile each zone is divided into n hash zones; secondly, for each unstructured record, the generated time t of the unstructured record and one or more key information key solely expressing the unstructured record are extracted; thirdly, according to the time t of each record, the zone where the records are stored is determined, and the corresponding hash zone values of the records in the zones are calculated according to the key information of the record; fourthly, the records which are in the same time interval and are the same in hash zone value are written in the same file F according to the calculation result in the step three, the number of the records in the file F is counted, and if the number of the records is larger than the set threshold value K, another file is established in a current hash zone for storage. According to the storage method, the data retrieval efficiency can be greatly improved.

Description

technical field [0001] The invention relates to a data storage method for massive unstructured data, in particular to a method for unstructured data organization in a distributed storage scenario that supports two-level division rules of time interval (Interval) and hash (Hash). A massive data management model and method that is dynamically adjusted according to the amount of data, and supports the creation of data indexes based on the data organization to achieve a more efficient retrieval rate. It belongs to the research field of massive data storage management. Background technique [0002] The continuous development of computer applications has led to a sharp increase in the amount of data. Because the data structuring process is limited by the speed of manual processing, the growth rate of unstructured data is much faster than that of structured data. For the current large-scale data that is constantly increasing to TB and PB levels, better tools or technologies are ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/172G06F16/325
Inventor 王琦刘阳杨鹏陈训逊王树鹏王勇王振宇
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products