Duplication eliminating method and system for recorded data under cloud architecture

A technology for recording data and processing methods, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve expensive problems, achieve the effect of saving storage costs and improving system efficiency

Active Publication Date: 2017-04-26
北京思特奇信息技术股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, due to reasons such as international roaming, the index of the telecommunications system generally needs to be kept for 2-3 months. For a province with 50 mi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplication eliminating method and system for recorded data under cloud architecture
  • Duplication eliminating method and system for recorded data under cloud architecture

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0042] Example 1

[0043] Such as figure 1 As shown, a method for processing record data under cloud architecture includes the following steps:

[0044] S1: According to the preset interval between occurrence time and receipt time, the recorded data received from upstream is divided into timely recorded data and late recorded data;

[0045] S2: Insert the timely record data received from upstream into the distributed memory database, extract the key information of the timely record data as index data and save it to the index table in the distributed memory database, and output the retained timely record data after removing duplicate records In order to record the export file for downstream use in time, and output the corresponding index data to the timely record index file;

[0046] S3: According to the interval range between the occurrence time and the receipt time preset in S1, the index data in the timely record index file output in S2 is imported into the HBASE database according ...

Example Embodiment

[0066] Example 2

[0067] Such as figure 2 As shown, a record data re-processing system under a cloud architecture includes:

[0068] The preprocessing module is used to divide the record data received from upstream into timely record data and late record data according to the preset interval between occurrence time and receipt time;

[0069] Timely record weight removal module, used to insert the timely recorded data received from upstream into the distributed memory database, extract the key information of the timely recorded data as index data and save it to the index table in the distributed memory database. After removing duplicate records, it will be removed. The retained timely record data is output as a timely record export file for downstream use, and the corresponding index data is output to the timely record index file;

[0070] Timely recording and warehousing module, which is used to import the index data in the timely record index file output by the timely record remov...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a duplication eliminating method and system for recorded data under cloud architecture, belongs to the field of duplication elimination of the recorded data, and aims to perform high-speed duplication elimination processing and long-time saving on the recorded data. The method comprises the following steps: dividing received recorded data into timely-recorded data and late-recorded data, inserting the timely-recorded data into a distributed memory database, extracting key information to establish index data, eliminating duplicated records, outputting retained timely-recorded data as a timely-recorded export file to be used downstream, and importing the index data of the timely-recorded data into a HBASE database; and inserting the late-recorded data and the corresponding index data into the HBASE database, eliminating duplicated records according to the index data in the HBASE database, and outputting non-duplicated late-recorded data as a lately-recorded export file to be used downstream. The method and the system are applied to high-speed duplication elimination processing of the recorded data under the condition of a large data receiving time span.

Description

technical field [0001] The invention relates to the field of deduplication processing of recorded data. Background technique [0002] Traditional deduplication processing is based on disk files or memory databases. Deduplication based on memory databases runs on small computers. The data exists on only one host, and only a part of the index is stored in the memory. If the current record corresponds to the business The index table corresponding to the time is not in the memory. Through the algorithm of memory swapping in and out, the contents of some tables are dropped to the disk and loaded into the memory of the corresponding table, so as to realize deduplication processing of a large amount of data. For example, in the records of bill files, the time span between records is relatively large, which will lead to frequent swapping in and swapping out, affecting performance. When the in-memory database evolves to a cloud-based scenario, it becomes a distributed memory databas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/1748G06F16/215
Inventor 严丽君
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products