Massive data comparison method and system based on hadoop cloud platform

A massive data, cloud platform technology, applied in digital data processing, structured data retrieval, special data processing applications, etc., can solve problems such as consistency comparison of massive data, and achieve the effect of improving efficiency and real-time comparison system

Inactive Publication Date: 2015-01-28
北京思特奇信息技术股份有限公司
View PDF5 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The technical problem to be solved by the present invention is to provide a massive data comparison method and system based on the Hadoop cloud platform, which is used to solve the problem of consistency comparison of massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive data comparison method and system based on hadoop cloud platform
  • Massive data comparison method and system based on hadoop cloud platform
  • Massive data comparison method and system based on hadoop cloud platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0036] Such as figure 1 As shown, the present embodiment provides a method for comparing massive data based on the Hadoop cloud platform, including the following steps:

[0037] Step 1, divide the massive data into several parts according to the interval scale, and use the cloud comparison engine to sort each data, output the corresponding number of internally ordered files, and then put the internally ordered files to the Hadoop-based cloud In the platform's distributed file system, compare files as source data;

[0038] Step 2, when there is a comparison task, the distributed file system schedules and controls other task nodes to execute task files through its main task node;

[0039] Step 3, each task node find...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a massive data comparison method and system based on a hadoop cloud platform. The method includes the steps: 1, massive data is cut into several parts according to an interval scale, and a cloud comparison engine is used for sorting each data so that a corresponding number of internal organized files are output to a distributed file system based on the hadoop cloud platform to be as a source data comparison file; 2, when a comparison task comes, the distributed file system schedules and controls the rest of task nodes to perform the comparison task file through a main task node; 3, each task node finds a to-be-executed task node to be compared with the source data comparison file, common records from the compared file are placed in the same file, and difference records are placed in a set difference file; 4, after every task node completes the file comparison, the main task node mergers and outputs the comparison results of every task node. The massive data comparison method and system based on the hadoop cloud platform improves efficiency of consistency ratio of the massive data.

Description

technical field [0001] The invention relates to the technical field of massive data processing, in particular to a method and system for comparing massive data based on a Hadoop cloud platform. Background technique [0002] At present, with the development of the telecommunications industry, data services are developing rapidly, and business rules have become relatively complex. Major operators have higher and higher requirements for data quality. However, due to unclear business rules, inconsistent business acceptance entrances and exits, non-standard business processes, unstable interfaces, and lack of data audits, there are differences between user data and business bureau data of each network element. Due to the huge amount of data, it is impossible to The audit is completed within a certain period of time. If it is repeated or the audit cycle is prolonged, there will be a time difference in the compared data, resulting in incorrect comparison results. [0003] By analy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/273G06F16/27
Inventor 何攀
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products