Data persistent storage method and system based on federal Hadoop distributed file system

A technology of distributed data and storage systems, which is applied to data error detection, transmission systems, and electrical digital data processing in the direction of redundancy in computing, and can solve data backup number adjustment, cluster paralysis, and inflexible data backup strategies and other issues to achieve the effect of preventing single point failure, high fault tolerance and security

Active Publication Date: 2018-01-09
NANJING UNIV OF POSTS & TELECOMM
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] First of all, this traditional HDFS backup strategy is not safe and prone to single point of failure. As long as there is a problem with the NameNode node in the cluster, the entire cluster will be paralyzed, and data cannot be read and written.
Secondly, the data backup strategy is not flexible. The number of backups for each data file stored in the HDFS cluster is the same, and the number of data backups cannot be adjusted according to the importance of the data or various factors to achieve different files. Having different backup numbers will cause the cluster to waste a lot of storage space to store unimportant data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data persistent storage method and system based on federal Hadoop distributed file system
  • Data persistent storage method and system based on federal Hadoop distributed file system
  • Data persistent storage method and system based on federal Hadoop distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] specific implementation plan

[0046] The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0047] The method of the present invention utilizes federated HDFS technology to set different backup strategies for different block storage pools, classify and store data received from the Internet of Things into these block storage pools, and Spark Streaming processes newly added data in HDFS in real time , store the processed results in the federated HDFS and MySQL databases according to their importance, realize flexible data storage and safe isolation, and analyze the processed data. The cluster system architecture diagram is as follows figure 1 shown.

[0048] 1. Architecture

[0049] In the traditional distributed storage system HDFS, because it is easy to generate a single point of failure, the entire cluster will be paralyzed due to the failure of one node; the data ba...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention introduces a data persistent storage method and system based on a federal Hadoop distributed file system (HDFS). Data transmitted by an Internet of things is collected by a sensor and uploaded to a server, and the data is cleaned, divided and saved to a federal HDFS having different backup strategies. The federal HDFS persistently stores the data. Spark Streaming reads and processesthe data in the federal HDFS. Result data processed by Spark Streaming is written into the federal HDFS and a MySQL database. The result data written into the federal HDFS needs to be cleaned and divided before persistent storage. The data written into the MySQL is used to analyze the result data.

Description

technical field [0001] The invention belongs to the field of big data storage based on a cloud computing platform. Background technique [0002] With the rapid development of the Internet, all walks of life are gradually integrated into the Internet, making the amount of data in the Internet grow exponentially. Therefore, how to effectively store and back up data has become an issue that has attracted more and more attention today. For example, in today's popular e-commerce, the storage and backup of massive user information, product information, and transaction information is particularly important. Or, the backup and storage of stock market information. The storage and backup of massive data, the better the storage method and backup strategy, the higher the reusability of the data, and can effectively reduce the number of secondary processing of the data. On the contrary, if the data storage and backup methods or strategies are unsatisfactory, it may lead to important d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/06H04L29/08G06F11/14G06F17/30
Inventor 李鹏陈芳州徐鹤王汝传宋金全李亮德
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products