Supercharge Your Innovation With Domain-Expert AI Agents!

Distributed data processing method and system

A technology of distributed data and processing methods, applied in the field of cloud computing, can solve problems such as increasing system complexity, increasing storage costs, error probability, and processing delays

Inactive Publication Date: 2017-03-29
ALIBABA GRP HLDG LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] Although this method provides a unified data landing method for offline computing and streaming computing, the practice of ignoring the differences in computing models also brings some obvious problems
[0014] For offline computing, the data required for computing is often organized in a distributed file system in some form in advance. Therefore, if the message queue is used as the data landing method, the offline computing system also needs an additional data middleware from The data is pulled from the message queue and stored in the distributed file system according to the requirements of offline computing. This not only increases the complexity of the system, but also requires an additional landing process for the data, which increases storage costs and errors. Probability and Latency of Processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data processing method and system
  • Distributed data processing method and system
  • Distributed data processing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0124] In order to make the above objects, features and advantages of the present application more obvious and comprehensible, the present application will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

[0125] In the computing model of streaming computing, take Apache Kafka as an example, such as figure 1 As shown, a typical Kafka cluster includes several Producers (which can be PageView generated by the web front end (Front End), or server (Service) logs, system CPU, Memory, etc.), several brokers (Kafka supports horizontal expansion, general broker The larger the number, the higher the cluster throughput), several Consumer Groups (such as Hadoop Cluster (Hadoop cluster), Real-time monitoring (real-time monitoring system), Other service (other services), Datawarehouse (data warehouse), etc.), and a Zookeeper cluster.

[0126] Kafka manages the cluster configuration through Zookeeper, elects the leader...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a distributed data processing method and system. The method comprises: a shard receives data uploaded by a client for a certain table; the shard stores the data in a storage directory corresponding to the table; when the storage succeeds, the shard sends the data to each of connected stream-mode compute nodes to perform stream-mode computation, so that the data is stored once and then can be simultaneously shared by an offline compute node and a real-time stream-mode compute node without depending on message-oriented middleware. Therefore, the complexity of the system is reduced, a once storage process is reduced compared with a message queue, and storage costs, error probability and processing delay are reduced.

Description

technical field [0001] The present application relates to the technical field of cloud computing, in particular to a distributed data processing method and a distributed data processing system. Background technique [0002] With the rapid development of the Internet and the explosive growth of data volume, cloud computing has been widely used, among which distributed massive data processing is one of the applications of cloud computing. [0003] Distributed massive data processing can be roughly divided into two directions: offline processing and streaming computing. [0004] Offline computing executes query computing on known data sets, such as the offline computing model "MapReduce". [0005] For stream computing, the data is unknown and flows in in real time. When the data flows in, the data is processed according to the defined computing model. [0006] Different computing models determine that offline computing and streaming computing have different requirements for h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08
CPCH04L67/10H04L67/1078G06F16/182G06F16/23
Inventor 杜川李闪段培乐魏蒲萌孙敬
Owner ALIBABA GRP HLDG LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More