Real-time data deduplication counting method and device

A real-time data, deduplication counting technology, applied in the field of data analysis, can solve the problems of waste of resources, large memory space, occupation, etc., and achieve the effect of accurate calculation, small quantity, and small memory space occupation

Active Publication Date: 2019-12-13
SUNING CLOUD COMPUTING CO LTD
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can be seen that the current conventional deduplication method will occupy a large memory space, resulting in a serious waste of resources, and it is difficult to estimate the memory usage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time data deduplication counting method and device
  • Real-time data deduplication counting method and device
  • Real-time data deduplication counting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to enable those skilled in the art to better understand the technical solutions of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Hereinafter, embodiments of the present invention will be described in detail, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. Those skilled in the art can understand that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be unde...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a real-time data deduplication counting method and a real-time data deduplication counting device, which can realize accurate calculation of a measurement deduplication counting value of real-time data and simultaneously occupy a small memory space in a deduplication process. The method provided by the embodiment of the invention comprises the following steps: acquiring real-time data, dimension information, dimension combination information and measurement information; decomposing all dimension combinations of the real-time data into one-dimensional combinations according to the dimension information, the dimension combination information and the measurement information, and generating a weight judgment key; and performing Redis batch duplicationjudgment on the duplicate judgment key by adopting a distributed lock mechanism to obtain a duplicate removal counting result. The uniqueness of the measurement field of the real-time data under eachdimension combination can be ensured, and the accurate deduplication count value of the real-time data can be obtained through duplication judgment. Only the duplication judgment key of the one-dimensional dimension combination needs to be stored for duplication judgment, a large number of memory resources are released, the length of the duplication judgment key is fixed, and the Redis memory usage amount can be estimated.

Description

technical field [0001] The invention belongs to the technical field of data analysis, and in particular relates to a real-time data deduplication counting method and device. Background technique [0002] In big data analysis, it is often necessary to perform deduplication statistics on a measurement field, but the currently commonly used online analytical processing (OLAP) tools cannot accurately calculate the deduplication count value of real-time data. And use the conventional scheme to deduplicate, that is, the Cartesian product of each dimension value set under all dimension combinations is the weight judgment key, and the maximum number of weight judgment keys for each dimension combination under a measurement field is the maximum number of weight judgment keys for each dimension combination under the dimension combination The product of bases. For example a Cube [1] Contains four dimensions of time, item, location and supplier, such as figure 1 As shown, there are 1...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2455
CPCG06F16/24553
Inventor 汪凯张盼盼韩振旭李成孙迁
Owner SUNING CLOUD COMPUTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products