Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method and device, storage medium and computer equipment

A data processing and database technology, applied in the field of data processing, can solve problems such as useful data discarding, unfavorable data analysis reliability, etc., to achieve the effect of reducing the possibility and improving data reliability

Pending Publication Date: 2022-08-09
GUANGZHOU WERIDE TECH LTD CO
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current HBase database deduplication is purely based on timestamps, which may discard some useful data, which is not conducive to the reliability of data analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device, storage medium and computer equipment
  • Data processing method and device, storage medium and computer equipment
  • Data processing method and device, storage medium and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

[0051] The embodiments of the present application are applied to data storage of an HBase database. In the HBase database, data is stored in the form of a table, and the table includes a row key (Row key), a column family (Column Family), and a timestamp (Timestamp). Among them, the row key is used to identify each row of data in the HBase table. A column family is a collection of columns. Timestamps are used to identify the version of the data. A ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data processing method and device, a storage medium and computer equipment. The method comprises the following steps: reading repeated storage units in each hfile of an HBase database, and dividing the repeated storage units into a plurality of repeated groups; respectively reading data attributes contained in the storage units in each repeated group and attribute values corresponding to the data attributes; obtaining duplicate removal rules corresponding to the data attributes according to the data attributes; the deduplication rule is a priority rule for data retention based on the attribute value of the data attribute; and on the basis of the duplicate removal rule corresponding to each duplicate group, performing duplicate removal processing on each storage unit in each duplicate group. According to different data attributes, duplicate removal can be carried out according to the priority of the data needing to be reserved, the possibility that useful data are discarded is reduced, and then the data reliability of a subsequent data analysis task is improved.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular, to a data processing method, apparatus, storage medium and computer equipment. Background technique [0002] HBase (Hadoop Database, Hadoop database) is a distributed, scalable NoSQL database that supports massive data storage. The underlying physical storage is stored in the key-value data format. All data files in HBase are stored on the Hadoop HDFS file system, which can realize parallel and distributed processing of complex tasks, and has high processing performance and reliability. However, a large amount of duplicate data may be stored in the HBase database. In order to save storage resources, the data needs to be deduplicated. [0003] The current HBase database deduplication is simply based on timestamps, which may discard some useful data, which is not conducive to the reliability of data analysis. SUMMARY OF THE INVENTION [0004] Based on this, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/2458G06F16/28G06F16/21
CPCG06F16/215G06F16/2465G06F16/285G06F16/219
Inventor 洪海滨韩旭
Owner GUANGZHOU WERIDE TECH LTD CO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products