Efficient method for extracting massive data

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of massive data and incremental data, applied in the field of big data, it can solve the problems of high value, low density, and large data system resource consumption.

Inactive Publication Date: 2018-04-03

ANHUI KECHUANG INTELLIGENT INTPROP SERVICE CO LTD

View PDF5 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] (4) Value High and Low Density

[0009] Data extraction in the prior art occupies a large amount of data system resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0024] A method for efficiently extracting massive data, including,

[0025] Step 1, use Golden Gate to extract data; use Golden Gate to parse Oracle log files to extract initial data and changed incremental data;

[0026] Step 2, use the MapReduce parallel computing engine to speed up the processing speed; use the MapReduce parallel computing framework of the Hadoop big data platform to speed up loading the extracted data;

[0027] Step 3, load the data into HBase; use multi-node parallel writing to directly generate the storage format file of HBase data.

[0028] In step 3, the method of batch import is adopted to load the data.

[0029] Incremental data in step 1 includes insert data, update data and delete data.

[0030] 1. Use Golden Gate to extract change data. Golden Gate can achieve second-level data capture, conversion and delivery, and provides a log-based structured data replication method, which can capture changed data from online logs in real time and save the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses an efficient method for extracting massive data. The method comprises: Step 1, extracting data by using the Golden Gate, parsing Oracle log files by using the Golden Gate, and extracting initial data and changed incremental data; Step 2, using the MapReduce parallel computing engine to speed up the processing, and using the MapReduce parallel computing framework ofthe Hadoop big data platform to speed up the data extracted by loading; and Step 3, loading the data into the HBase, and using the multi-node parallel writing to directly generate a storage format file of HBase data. The present invention provides a log-based structured data replication method, which can capture the changed data from the online log in real time and save the changed data in a Trailformat file; the advantages are that capture of changed data only occupies a small amount of system resources through the analysis of log files, and especially when the amount of data stored in the Oracle is very large and the Oracle system is heavily loaded, the operation efficiency of the Oracle substantially cannot be affected.

Description

technical field [0001] The invention relates to the field of big data, in particular to a method for efficiently extracting massive data. Background technique [0002] Similar terms have appeared in the history of data development, including ultra-large-scale data and massive data. "Super large-scale" generally refers to data corresponding to GB (1GB=1024MB), "massive" generally refers to data at the level of TB (1TB=1024GB), and the current "big data" refers to PB (1PB=1024TB), EB (1EB=1024PB), or even data above the ZB (1ZB=1024EB) level. In 2013, Gartner predicted that the data stored in the world will reach 1.2ZB. If these data are burned to CD-R read-only discs and piled up, the height will be five times the distance from the earth to the moon. Behind the different scales are different technical problems or challenging research problems. [0003] Big data refers to a collection of data that cannot be captured, managed and processed by conventional software tools with...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/182G06F16/254

Inventor石文威

OwnerANHUI KECHUANG INTELLIGENT INTPROP SERVICE CO LTD

Efficient method for extracting massive data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology