Efficient method for extracting massive data

A technology of massive data and incremental data, applied in the field of big data, it can solve the problems of high value, low density, and large data system resource consumption.

Inactive Publication Date: 2018-04-03
ANHUI KECHUANG INTELLIGENT INTPROP SERVICE CO LTD
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (4) Value High and Low Density
[0009] Data extraction in the prior art occupies a large amount of data system resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0024] A method for efficiently extracting massive data, including,

[0025] Step 1, use Golden Gate to extract data; use Golden Gate to parse Oracle log files to extract initial data and changed incremental data;

[0026] Step 2, use the MapReduce parallel computing engine to speed up the processing speed; use the MapReduce parallel computing framework of the Hadoop big data platform to speed up loading the extracted data;

[0027] Step 3, load the data into HBase; use multi-node parallel writing to directly generate the storage format file of HBase data.

[0028] In step 3, the method of batch import is adopted to load the data.

[0029] Incremental data in step 1 includes insert data, update data and delete data.

[0030] 1. Use Golden Gate to extract change data. Golden Gate can achieve second-level data capture, conversion and delivery, and provides a log-based structured data replication method, which can capture changed data from online logs in real time and save the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses an efficient method for extracting massive data. The method comprises: Step 1, extracting data by using the Golden Gate, parsing Oracle log files by using the Golden Gate, and extracting initial data and changed incremental data; Step 2, using the MapReduce parallel computing engine to speed up the processing, and using the MapReduce parallel computing framework ofthe Hadoop big data platform to speed up the data extracted by loading; and Step 3, loading the data into the HBase, and using the multi-node parallel writing to directly generate a storage format file of HBase data. The present invention provides a log-based structured data replication method, which can capture the changed data from the online log in real time and save the changed data in a Trailformat file; the advantages are that capture of changed data only occupies a small amount of system resources through the analysis of log files, and especially when the amount of data stored in the Oracle is very large and the Oracle system is heavily loaded, the operation efficiency of the Oracle substantially cannot be affected.

Description

technical field [0001] The invention relates to the field of big data, in particular to a method for efficiently extracting massive data. Background technique [0002] Similar terms have appeared in the history of data development, including ultra-large-scale data and massive data. "Super large-scale" generally refers to data corresponding to GB (1GB=1024MB), "massive" generally refers to data at the level of TB (1TB=1024GB), and the current "big data" refers to PB (1PB=1024TB), EB (1EB=1024PB), or even data above the ZB (1ZB=1024EB) level. In 2013, Gartner predicted that the data stored in the world will reach 1.2ZB. If these data are burned to CD-R read-only discs and piled up, the height will be five times the distance from the earth to the moon. Behind the different scales are different technical problems or challenging research problems. [0003] Big data refers to a collection of data that cannot be captured, managed and processed by conventional software tools with...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/254
Inventor 石文威
Owner ANHUI KECHUANG INTELLIGENT INTPROP SERVICE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products