Distributed data processing method and device

A distributed data and distributed processing technology, applied in the field of data processing, can solve problems such as resource waste, processing failure, and lower data processing efficiency, and achieve the effect of ensuring stability and improving efficiency

Inactive Publication Date: 2017-05-31
BEIJING QIHOO TECH CO LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the process of implementing the present invention, the inventor found that there are at least the following problems in the prior art: in the process of distributed data processing, the data to be processed is usually distributed to each server participating in the processing according to the key of the data, Therefore, data with the same key will be distributed to the same server. If the amount of data corresponding to a certain key is very large, uneven data distribution will occur.
At this time, the amount of data distributed by individual servers is particularly large, while the amount of data distributed by other servers is relatively small. In this case, the server with a large amount of data has a large computing load and takes a long time to process, which reduces the overall data processing efficiency. ; while the server with less data has a small computing load, part of the computing capacity is idle, and the utilization rate is not high, resulting in a waste of resources
At the same time, when the amount of data processed on a server is too large, processing failures often occur

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data processing method and device
  • Distributed data processing method and device
  • Distributed data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] figure 1 It shows a schematic flow chart of a distributed data processing method provided by Embodiment 1 of the present invention, and the method includes:

[0030] Step S110: Determine the data corresponding to the key whose occurrence number is greater than the preset number in the same data set as slanted data, determine the data set containing slanted data as slanted data set, and determine the data set that does not contain slanted data as non-slanted data data set.

[0031] In the process of distributed data processing, the data to be processed is allocated to different servers according to the key value of the corresponding key. If there is too much data distributed on one server, the processing time of that server will be too long, while the data distributed on other servers will be less, and the processing time will be less, resulting in the operation bottleneck of the entire distributed data processing process concentrated on one On the server, thereby reducing t...

Embodiment 2

[0042] figure 2 It shows a schematic flow chart of a distributed data processing method provided by Embodiment 2 of the present invention, and the method includes:

[0043] Step S210: Determine the data corresponding to the key whose occurrence number is greater than the preset number in the same data set as slanted data, determine the data set containing slanted data as slanted data set, and determine the data set that does not contain slanted data as non-slanted data data set.

[0044] In the process of distributed data processing, the data to be processed is allocated to different servers according to the key value of the corresponding key. If there is too much data distributed on one server, the processing time of that server will be too long, while the data distributed on other servers will be less, and the processing time will be less, resulting in the operation bottleneck of the entire distributed data processing process concentrated on one On the server, thereby reducing ...

Embodiment 3

[0071] image 3 It shows a schematic structural diagram of a distributed data processing device provided in the third embodiment of the present invention. The device includes: a determination module 310, an oblique data set marking module 320, a non-inclined data set marking module 330, and an allocation module 340.

[0072] Determining module 310: Determining data corresponding to keys with more than a preset number of occurrences in the same data set as oblique data, determining data sets containing oblique data as oblique data sets, and determining data sets not containing oblique data as non-oblique data. Tilt the data set.

[0073] In the process of distributed data processing, the data to be processed is allocated to different servers according to the key value of the corresponding key. If there is too much data distributed on one server, the processing time of that server will be too long, while the data distributed on other servers will be less, and the processing time will...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed data processing method and device, and relates to the technical field of data processing. The method includes the steps: determining data corresponding to keys to be inclined data, determining a data set with the inclined data to be an inclined data set, and determining a data set without the inclined data to be a non-inclined data set; respectively adding a key identification for an original key of each datum in the inclined data set according to a preset key identification set; performing expansion for the non-inclined data set, and respectively adding a key identification for an original key of each datum in the expanded non-inclined data set according to the key identification set; distributing the data in the processed inclined data set and the processed non-inclined data set to multiple servers according to preset data distribution rules, and processing the data in a distributed manner. Occurrence frequency of the keys is higher than preset frequency in the same data set. According to the method, data are more uniformly distributed onto servers, and the processing efficiency of the distributed data is improved.

Description

Technical field [0001] The present invention relates to the technical field of data processing, in particular to a distributed data processing method and device. Background technique [0002] Distributed data processing can also be called distributed computing, distributed processing, or distributed transaction processing. Distributed data processing means that during calculation or processing, the initiator divides the data that needs to be calculated or the transaction that needs to be processed into multiple sub-calculations, sub-processing or sub-transactions, and then assigns these sub-calculations, sub-processing or sub-transactions to multiple Participants perform calculations or processing, and finally the initiator combines the processing results of each participant to obtain the final result. [0003] With the advent of the big data era, the amount of data to be processed in all walks of life is increasing, and the introduction of distributed data processing technology h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5083
Inventor 邓怡豪
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products