Unlock instant, AI-driven research and patent intelligence for your innovation.

A data distribution method and device suitable for distributed databases

A data distribution and database technology, applied in the field of distributed databases, can solve problems such as increasing computing workload, increasing I/O dispersion, and increasing system burden, so as to reduce system burden, reduce disk I/O, reduce The effect of budgeted workload

Active Publication Date: 2020-12-04
TIANJIN NANKAI UNIV GENERAL DATA TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still certain problems in the above method. The data that needs to be distributed to form a result set needs to be materialized once, and splitting it to form a new result set is equivalent to secondary materialization.
While increasing the computing workload, it is also necessary to save two copies of the data in the memory, which increases the burden on the system
In addition, if the task is directly split according to the degree of parallelism for the full result set, it will also increase the I / O dispersion and reduce the performance of the entire system.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data distribution method and device suitable for distributed databases
  • A data distribution method and device suitable for distributed databases
  • A data distribution method and device suitable for distributed databases

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] figure 1 The flow chart of the data distribution method applicable to distributed databases provided by Embodiment 1 of the present invention, this embodiment is applicable to the situation of distributing data to distributed database nodes, and this method can be implemented by a data sub-device applicable to distributed databases To execute, the device can be realized by software / hardware, and can be integrated into the DBMS.

[0041] see figure 1 , the data distribution method applicable to distributed databases, including:

[0042] S110. When there is a data distribution task, split the data according to the minimum storage unit.

[0043] The DBMS determines that there is a data distribution task through detection, and based on the data distribution task, splits the data to be distributed according to the smallest storage unit. For example, the smallest storage unit is a row, reads a row of data, and divides the data into each row split.

[0044] S120. Calculate...

Embodiment 2

[0049] figure 2 The second embodiment of the present invention provides a flow chart of a data distribution method suitable for distributed databases. The embodiment of the present invention is based on the above-mentioned embodiments. Further, when distributing data to nodes according to the distribution target, the following steps are further included: Materialize the data result set corresponding to the distribution task.

[0050] see figure 2 , the data distribution method applicable to distributed databases, including:

[0051] S210. When there is a data distribution task, split the data according to the smallest storage unit.

[0052] S220. Calculate the distribution target of the split minimum storage unit data, and distribute the data to the nodes according to the distribution target.

[0053] S230. Materialize the data result set corresponding to the distribution task.

[0054] Materialization can be used to pre-calculate and save the results of time-consuming o...

Embodiment 3

[0057] figure 2 The second embodiment of the present invention provides a flow chart of a data distribution method suitable for distributed databases. The embodiment of the present invention is based on the above-mentioned embodiments. Further, the data result set corresponding to the materialized distribution task is specifically optimized. It is: when querying the result set, obtain the distribution tasks in turn according to threads with a fixed size.

[0058] see image 3 , the data distribution method applicable to distributed databases, including:

[0059] S310. When there is a data distribution task, split the data according to the minimum storage unit.

[0060] S320. Calculate the distribution target of the split minimum storage unit data, and distribute the data to the nodes according to the distribution target.

[0061] S330. When querying the result set, obtain distribution tasks in turn according to threads with a fixed size.

[0062] When the query result set i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data distribution method and device suitable for a distributed database. The method comprises the steps of splitting data according to a minimum storage unit when a data distribution task exists, calculating distributed targets of the split data according to the minimum storage unit, and distributing the data to nodes according to the distributed targets. The data are split according to the minimum storage unit, the distributed targets of the split data are calculated, and the data are distributed to the distributed targets. Materialization is continuously postponed, so that no data materialization or less data materialization is kept as far as possible in a whole data calculation process, and the disk input / output is reduced. The data are not materialized until the data are split according to distribution rules at last. The budgeting workload and the system burden are reduced.

Description

technical field [0001] The invention belongs to the technical field of distributed databases, and in particular relates to a data distribution method and device suitable for distributed databases. Background technique [0002] With the increasing scale and regionalization of information systems, the role of data bridges played by distributed databases in information systems is becoming more and more important, and the design of distributed databases is widely used in systems. Distributed database refers to the use of high-speed computer network to connect multiple physically dispersed computers to logically form a whole database organization form. Each computer may have a complete copy of the DBMS, or a partial copy of the copy, and has its own local database, through distribution to obtain greater storage capacity and higher concurrent access. [0003] During the use of distributed databases, with the increase of distributed nodes and data volume, how to quickly and accura...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27G06F16/22
CPCG06F16/2282G06F16/27G06F16/278
Inventor 武新崔维力刘威郑黎辉
Owner TIANJIN NANKAI UNIV GENERAL DATA TECH