Method for duplicate checking of mass data and system thereof

A technology of massive data and data, applied in the field of data processing, can solve the problem of duplicate checking, unable to complete massive data independently by ordinary PC, and achieve the effect of low cost, fast duplicate checking speed and low requirement.

Active Publication Date: 2009-12-23
刘飞
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that it is impossible to independently complete the duplicate checking of massive data on ordinary PCs, provide a method and system for checking duplicates of massive data, and realize the function of independently checking duplicates of massive data in a low-configuration environment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for duplicate checking of mass data and system thereof
  • Method for duplicate checking of mass data and system thereof
  • Method for duplicate checking of mass data and system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings. The massive data mentioned in the specific embodiments of the present invention refers to huge / unprecedented vast data. Now many business departments need to operate massive amounts of data. For example, the planning department has planning data, the water conservancy department has water conservancy data, and the meteorological department has meteorological data. Data between business systems. The amount of data processed by these departments is very large. It includes a variety of spatial data, report statistics, text, sound, images, hypertext and other environmental and cultural data information.

[0021] figure 1 For an embodiment of the method for checking duplicates of massive data according to the present invention, please refer to figure 1 , a method for checking duplicates of massive data, the method comprising:

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for duplicate checking of mass data and a system thereof, the method comprises the following steps: extracting data key words from the mass data, wherein, the data key words are used for separating the data from other data areas; dividing the data key words according to the first N+M letters of the data key words, and putting the data key words with the same first N+M letters in a file to obtain key word data files; wherein, the first N letters of the data key words are same, the first N+M letters are not exactly same (N and M are nonnegative integers); and performing duplicate checking on the data in the key word data files to obtain the duplicate checking result. The method helps realize a function of independent duplicate checking of mass data in a low configuration environment.

Description

technical field [0001] The invention relates to data processing technology, in particular to a method and system for checking duplicates of massive data. Background technique [0002] With the expansion of operation scale and adjustment of business operations of operators such as China Telecom, China Mobile, China Unicom, and China Netcom, data import and export between operators' internal systems and systems between operators has become more and more frequent. In the process of data import and export, it is becoming more and more important to check the correctness of massive data, which involves checking whether the massive data is duplicated. Duplicate data will lead to abnormal operation of the system, failure of business processing, duplication of user billing, etc., seriously affecting the normal operation of the system. [0003] Existing tools and methods need to occupy a huge amount of memory or need the support of a dedicated database when checking massive data, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 牛国扬
Owner 刘飞
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products