Data cleaning method and device based on data popularity and storage medium

A data cleaning and data processing technology, applied in the field of data cleaning, can solve problems such as high complexity, low efficiency, and low accuracy, and achieve the effect of improving cleaning efficiency and accuracy

Pending Publication Date: 2021-03-26
北京思特奇信息技术股份有限公司
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the development of big data platforms, large-scale data warehouses, data lakes and other large data centers are becoming more and more common. While data centers continue to accumulate data, they also bring pressure on storage and performance to ensure efficient operation of data centers and improve data quality. The value of central data poses new challenges to data operation and maintenance. Cleaning up low-value and low-hot data in a timely manner is an effective means to solve the above methods. However, the current evaluation methods for data popularity and value rely more on manual methods , low efficiency and high complexity, and often have the problem of low accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device based on data popularity and storage medium
  • Data cleaning method and device based on data popularity and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0020] figure 1 It is a schematic flowchart of a data cleaning method based on data popularity provided by an embodiment of the present invention.

[0021] Such as figure 1 As shown, a data cleaning method based on data heat includes the following steps:

[0022] Collect data popularity information from the target data platform;

[0023] Analyzing multiple data types in the data popularity information respectively to obtain the popularity of each data type;

[0024] Performing a heat assessment on the heat of each data type according to the preset heat weight to obtain evaluation information on the heat of the data type;

[0025] Determine the type of data to be deleted according to the preset cleaning strategy ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data cleaning method and device based on data popularity and a storage medium. The method comprises the steps of collecting data popularity information from a target data platform; analyzing a plurality of data types in the data popularity information to obtain popularity of each data type; according to a preset popularity weight, carrying out popularity evaluation on eachdata type popularity to obtain evaluation information of the data type popularity; and determining a to-be-deleted data type according to a preset cleaning strategy and the evaluation information, and cleaning data corresponding to the to-be-deleted data type. According to the method and device, the collected data heat information can be analyzed to obtain the data type heat, the heat evaluationinformation of the data type heat is given based on the data heat evaluation model, the data type is automatically cleaned according to the evaluation information, manual processing is not needed, andthe cleaning efficiency and accuracy are improved.

Description

technical field [0001] The present invention mainly relates to the technical field of data cleaning, and in particular to a data cleaning method, device and storage medium based on data heat. Background technique [0002] With the development of big data platforms, large-scale data warehouses, data lakes and other large data centers are becoming more and more common. While data centers continue to accumulate data, they also bring pressure on storage and performance to ensure efficient operation of data centers and improve data quality. The value of central data poses new challenges to data operation and maintenance. Cleaning up low-value and low-hot data in a timely manner is an effective means to solve the above methods. However, the current evaluation methods for data popularity and value rely more on manual methods , low efficiency and high complexity, and often have the problem of low accuracy. Contents of the invention [0003] The technical problem to be solved by t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215
CPCG06F16/215
Inventor 严敏
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products