Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Duplication eliminating method and system for recorded data under cloud architecture

A technology for recording data and processing methods, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve expensive problems, achieve the effect of saving storage costs and improving system efficiency

Active Publication Date: 2017-04-26
北京思特奇信息技术股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, due to reasons such as international roaming, the index of the telecommunications system generally needs to be kept for 2-3 months. For a province with 50 million users, the index for one month is about 3T, and the index for three months is about 9T. These are stored in the distributed memory library. information is expensive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplication eliminating method and system for recorded data under cloud architecture
  • Duplication eliminating method and system for recorded data under cloud architecture

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] like figure 1 As shown, a method for deduplication processing of recorded data under a cloud architecture includes the following steps:

[0044] S1, divide the recorded data received from the upstream into timely recorded data and late recorded data according to the preset interval range between the occurrence time and the received time;

[0045] S2, insert the timely record data received from the upstream into the distributed memory database, extract the key information of the timely record data as index data and save it to the index table in the distributed memory database, and output the timely record data that will be retained after eliminating duplicate records In order to record the export file in time for downstream use, and output the corresponding index data to the timely record index file;

[0046] S3, import the index data in the timely record index file output in S2 into the HBASE database according to the interval range between the time of occurrence and t...

Embodiment 2

[0067] like figure 2 As shown, a record data deduplication processing system under the cloud architecture includes:

[0068] The preprocessing module is used to divide the record data received from the upstream into timely record data and late record data according to the preset interval range between occurrence time and receipt time;

[0069] The timely record deduplication module is used to insert the timely record data received from the upstream into the distributed memory database, extract the key information of the timely record data as index data and save it to the index table in the distributed memory database, and will be deleted after duplicate records are removed The retained timely record data output is the timely record export file for downstream use, and the corresponding index data is output to the timely record index file;

[0070] The timely record storage module is used to import the index data in the timely record index file output from the timely record an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a duplication eliminating method and system for recorded data under cloud architecture, belongs to the field of duplication elimination of the recorded data, and aims to perform high-speed duplication elimination processing and long-time saving on the recorded data. The method comprises the following steps: dividing received recorded data into timely-recorded data and late-recorded data, inserting the timely-recorded data into a distributed memory database, extracting key information to establish index data, eliminating duplicated records, outputting retained timely-recorded data as a timely-recorded export file to be used downstream, and importing the index data of the timely-recorded data into a HBASE database; and inserting the late-recorded data and the corresponding index data into the HBASE database, eliminating duplicated records according to the index data in the HBASE database, and outputting non-duplicated late-recorded data as a lately-recorded export file to be used downstream. The method and the system are applied to high-speed duplication elimination processing of the recorded data under the condition of a large data receiving time span.

Description

technical field [0001] The invention relates to the field of deduplication processing of recorded data. Background technique [0002] Traditional deduplication processing is based on disk files or memory databases. Deduplication based on memory databases runs on small computers. The data exists on only one host, and only a part of the index is stored in the memory. If the current record corresponds to the business The index table corresponding to the time is not in the memory. Through the algorithm of memory swapping in and out, the contents of some tables are dropped to the disk and loaded into the memory of the corresponding table, so as to realize deduplication processing of a large amount of data. For example, in the records of bill files, the time span between records is relatively large, which will lead to frequent swapping in and swapping out, affecting performance. When the in-memory database evolves to a cloud-based scenario, it becomes a distributed memory databas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/1748G06F16/215
Inventor 严丽君
Owner 北京思特奇信息技术股份有限公司
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More