Large-scale industrial data compression storage method, system and medium

A technology for industrial data, compressed storage, applied in file systems, file system management, electrical digital data processing, etc., can solve problems such as disk waste, reduce work time, ensure consistency, and improve work efficiency.

Active Publication Date: 2021-10-01
上海微亿智造科技有限公司 +1
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This results in a waste of disk

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale industrial data compression storage method, system and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0041] Such as figure 1 According to the large-scale industrial data compression and storage method provided by the present invention, comprising:

[0042] Industrial data extraction steps: Configure different FlumeSources according to different data sources, and operate the FlumeSource configuration through the interface to realize configurable general settings;

[0043] Temporary preloading of data into Avro steps: define the conversion chain, configure the Schame of the Avro data format, and configure Morphline to temporarily convert different types of data formats into Avro format data;

[0044] Create a Dataset step: Create a dataset with parquet as the storage format in Hdfs through Dataset and compress the data with the GPL protocol, and declare that the final landing data is in Parquet format and snappy compression format;

[0045] Combined operation steps: connect and run the above steps through flume configuration, and finally the data is compressed from a large amo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a large-scale industrial data compression storage method, system and medium, including: step 1: configuring different data acquisition systems according to the type of data source, and extracting the data collected by the data acquisition system through interface operation; step 2 : Define the conversion chain, and temporarily convert the format of different types of data extracted into Avro format through the data cleaning plug-in; Step 3: Compress the data in Avro format with the GPL protocol, and the compression format is snappy, and create the following in the distributed file system Parquet is a data set in a storage format, which stores compressed data. The invention can define conversion chains and compression and storage formats for any type of data, greatly improving the data processing speed and data compression ratio of the computing platform.

Description

technical field [0001] The present invention relates to the technical field of data compression storage, in particular to a large-scale industrial data compression storage method, system and medium. Background technique [0002] With the vigorous development of new infrastructure, more and more traditional industrial enterprises have begun to use Internet technology to improve productivity, among which data is the most critical. In the traditional Internet, there are more and more data in big data processing, and many companies will back up two copies of data. This results in a waste of disk. [0003] Patent document CN108304472A (application number: 201711455790.2) discloses a data compression storage method and a data compression storage device. The data compression method includes the following steps: a segmentation step, which divides the original data into multiple fields; and a compression step, based on Depending on the data content, different compression strategies...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/11G06F16/16G06F16/174G06F16/182
CPCG06F16/116G06F16/16G06F16/1744G06F16/182
Inventor 高响
Owner 上海微亿智造科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products