Data processing method and device, terminal and storage medium

A technology for data processing and storage media, applied in the field of data processing, can solve problems such as low efficiency and data skew, and achieve the effect of improving efficiency

Pending Publication Date: 2022-01-14
CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the above technical problems, the embodiment of the present invention provides a data processing method, device, terminal and storage medium, which at least solve the problems of data skew and low efficiency in the process of sharding the database

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device, terminal and storage medium
  • Data processing method and device, terminal and storage medium
  • Data processing method and device, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to facilitate understanding of the technical solutions of the embodiments of the present application, the related technologies of the embodiments of the present application are described below.

[0025] In related technologies, when performing database fragmentation, there are the following problems:

[0026] (1) The DataX tool only runs on a single node, and there is an input / output (I / O) bottleneck. It is not suitable for importing and exporting big data, and it cannot perform sharding at the database table level.

[0027] (2) Although tools such as Sqoop use column sharding, it is prone to data skew and records with empty shard column values ​​cannot be read.

[0028] (3) Some tools use row-based sharding. The disadvantage of this method is that it takes a long time to calculate and query the total number of records; sharding involves structured query language nesting, and it takes a long time to extract data. The extraction time is very slow; there is also...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data processing method and device, a terminal and a storage medium. The method comprises the steps of determining a fragment column of a to-be-processed database; based on at least one number meeting a preset threshold value in the fragment column, fragmenting the database to obtain at least one database fragment; determining the distribution of data records meeting a preset extraction condition in the database in the at least one database fragment; and based on the distribution of the data records, adjusting the at least one database fragment to obtain a target database fragment. Therefore, the database fragmentation efficiency can be improved, and the database fragment with uniform data distribution is obtained.

Description

technical field [0001] The present application relates to data processing technologies, including but not limited to a data processing method, device, terminal and storage medium. Background technique [0002] In related technologies, there are many tools that support the import of data in relational databases into other systems, such as offline data synchronization tools / platforms (DataX), data conversion tools (Sqoop), open source data exchange tools (Kettle) and data integration platforms ( TurboDX), among which DataX and Sqoop open source tools are widely used. DataX is an open source data import and export tool. It runs on a single node. It does not shard the database table, but a table as a concurrent one. For a table with a large amount of data, it performs single-concurrent processing, so There is a bottleneck in efficiency. Sqoop is also an open source tool, mainly used for importing and exporting between distributed system infrastructure Hadoop components and rel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/21G06F16/215
CPCG06F16/214G06F16/215
Inventor 王玉雷
Owner CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products