Unlock instant, AI-driven research and patent intelligence for your innovation.

Data extraction method and device

A data extraction and data technology, applied in database models, relational databases, digital data processing, etc., can solve problems affecting data extraction efficiency, low efficiency, multi-thread efficiency discount, etc., and achieve the effect of improving data extraction efficiency

Active Publication Date: 2014-12-03
INSPUR BEIJING ELECTRONICS INFORMATION IND
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, if the method of extracting data through interfaces such as ODBC or JDBC does not use multi-threaded parallelism, the efficiency will be relatively low, especially in today's era of big data, it is often necessary to extract database tables with hundreds of millions of data
Multi-thread parallel data extraction needs to pre-segment the data in the data source. If the distribution of data entries allocated by each thread is uneven, the efficiency of multi-threading will be greatly reduced; but if you want the data allocated by each thread to be very uniform, It is necessary to calculate the detailed distribution of data in the data table, which requires a lot of database operations before extracting data, which affects the efficiency of data extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data extraction method and device
  • Data extraction method and device
  • Data extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Such as figure 1 As shown, the data extraction methods applied to relational databases in the present invention include:

[0056]S101: According to the value range distribution of a certain field in the selected data table, divide the data table into M data partitions; the type of the field is numeric or the value of the field can be converted into a numeric value;

[0057] Users can preset the number M of data partitions and the total number N of threads to be allocated.

[0058] Specifically, after selecting a certain field id, query the minimum and maximum values ​​of the field id in the database Min(id) and Max(id), and execute SQL statements in the relational database through the ODBC or JDBC interface:

[0059] select max(id), min(id) from[table name]

[0060] Divide the value range [Min(id), Max(id)] of the field id into M data partitions on average. Such as figure 2 As shown, the intervals of M data partitions are evenly allocated according to the minimum v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data extraction method applied to a relational database. The method includes: according to range distribution of a certain field in a selected data table, dividing the data table into M data partitions, wherein the type of the field is a numeric type or a value of the field can be converted into a numerical value; computing weight of each data partition according to number of data lines of each data partition; allocating a thread count to each data partition according to the weight of each data partition; enabling the sum of the thread counts allocated to all the data partitions to be equal to a preset total thread count N, wherein M< / =N; opening N threads, and respectively performing data extraction on each data partition by adopting the corresponding number of threads according to the allocated thread counts. By dividing the data table into the data partitions and dynamically allocating the thread counts of each data partition, the problem of nonuniform allocation data of each thread is solved, and data extraction efficiency of the relational data is improved.

Description

technical field [0001] The invention relates to the field of data extraction, in particular to a data extraction method and device for a relational database. Background technique [0002] Data integration is the logical or physical concentration of data from different sources, formats, and characteristics, so as to provide comprehensive data sharing. It is an important part of enterprise business intelligence and data warehouse systems. ETL is the main solution for enterprise data integration. The three letters in ETL represent Extract, Transform, and Load, that is, extraction, conversion, and loading. Data extraction is the process of extracting data from a data source. In practical applications, relational databases are mostly used as data sources. [0003] The methods of extracting data from relational databases can be divided into methods such as directly exporting backup data and reading data through interfaces such as JDBC. Among them, the method of reading through...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/254G06F16/284
Inventor 曹连超辛国茂亓开元刘伟李占强卢军佐
Owner INSPUR BEIJING ELECTRONICS INFORMATION IND