Cold and hot data analysis method and system

A technology of hot and cold data and analysis methods, applied in the field of big data, can solve problems such as omissions and deletions by mistake, waste of manpower and material resources, and cumbersome work, so as to avoid misoperation, improve work efficiency, and save system resources.

Pending Publication Date: 2020-11-20
BEIJING XUEZHITU NETWORK TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This manual method has the disadvantage of low efficiency when searching for long-term unused data. At the same time, it consumes a lot of manpower and material resources, and it is prone to omissions and accidental deletions while the work is tedious.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cold and hot data analysis method and system
  • Cold and hot data analysis method and system
  • Cold and hot data analysis method and system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0061] figure 1 It is a schematic diagram of the cold and hot data analysis principle of the embodiment of the present invention; figure 2 It is a preferred schematic flow diagram of the cold and hot data analysis method of the embodiment of the present invention; combined with reference figure 1 , 2 As shown, this embodiment discloses a cold and hot data analysis method, including:

[0062] The file data information acquisition step S1 is used to obtain the data information of the file in the HDFS by analyzing the metadata node NameNode and the audit log AuditLog of a Hadoop distributed file system HDFS, and the data information includes but is not limited to data storage information and data usage information;

[0063] The hot and cold data analysis step S2 is used to analyze the data information of the files in HDFS by using the neural network algorithm model LSTM, and divide the files of HDFS into cold data and hot data by using the data information as a reference.

[...

specific Embodiment 2

[0087] Only the differences between this embodiment and the specific embodiment 1 are described below, and the similarities are not repeated here. Figure 4-5 They are another preferred principle schematic diagram and flow diagram of the hot and cold data analysis method of the embodiment of the present invention, combined with reference Figure 4-5 As shown, the difference between the cold and hot data analysis method of this embodiment and the specific embodiment 1 is:

[0088] Considering that only the neural network algorithm model LSTM is used to divide the hot and cold data, there may be a problem that some files have not been accessed for a long time from the current time point but are mistakenly marked as 1. Therefore, the hot and cold data analysis step S2 also includes:

[0089] The LRU hot and cold data division step S204 is used to analyze the data storage information and data usage information to obtain the last modification time and / or access time of the file in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a cold and hot data analysis method and system. The cold and hot data analysis method comprises the following steps: a file data information acquisition step: analyzing a metadata node NameNode and an audit log AudiLog of a Hadoop distributed file system HDFS to obtain data information of a file in the HDFS; and a cold and hot data analysis step for analyzing the data information of the file in the HDFS by using a neural network algorithm model LSTM, and dividing the file of the HDFS into cold data and hot data by taking the data information as a reference. According tothe scheme, the file data information is analyzed through the neural network algorithm model LSTM to obtain the cold and hot data; and meanwhile, a cache elimination algorithm LRU is utilized to improve the accuracy of analyzing the cold and hot data, data support for data cleaning is provided for a service group and operation and maintenance personnel, the efficiency and accuracy of data cleaningare improved, and the personnel cost is saved.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a cold and hot data analysis method and system. Background technique [0002] Hadoop Distributed File System (HDFS for short) refers to a distributed file system designed to run on commodity hardware. HDFS can provide high-throughput data access and is very suitable for applications on large-scale data sets. [0003] In HDFS, NameNode is a metadata node and DateNode is a data node. NameNode maintains the mapping table of files and data blocks and the mapping table of data blocks and data nodes; DataNode stores real data. The metadata node NameNode manages the namespace of the file system. It maintains the file system tree and all files and directories in the entire tree. These information are stored in two files, one is the metadata image file FsImage, and the other is the metadata operation Log EditLog. Among them, the metadata image file FsImage saves the lates...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06G06F12/123G06N3/04G06N3/08
CPCG06F3/0607G06F3/0638G06F3/067G06F12/123G06N3/049G06N3/08G06N3/044G06N3/045
Inventor 李婉洁刘远郭颂
Owner BEIJING XUEZHITU NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products