Parallel processing of continuous queries on data streams

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a data stream and parallel processing technology, applied in the field of data stream processing and event management, can solve the problems of inability to scale out with respect to the incoming stream volume, system capacity limitation, and inability to scale ou

Inactive Publication Date: 2011-12-22

UNIV MADRID POLITECNICA

View PDF0 Cites 234 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

[0010]Parallel processing of data streams allows providing scalability and that way, increasing the throughput by means of the addition of new nodes. This parallel processing can be applied to data stream processing and complex events processing.

[0016]Stream processing engines can be centralized or distributed. A centralized stream processing engine has a single system instance executed in a single computer or node. That is, the system is executed in a single node. A distributed stream processing engine has multiple instances, that is, multiple executions of the system are performed and each instance can be executed by different nodes. The most basic distributed engines can execute different queries in different nodes. Thereby, they can scale out the number of queries by increasing the number of nodes. Some distributed engines enable distributing query operators in different nodes. This allows them to scale out with respect to the number of operators by increasing the number of nodes.

[0035]If any source subquery does not produce tuples to be processed by the destination subquery, then the input merger will block. To avoid this situation the load balancers would work as it follows. Each load balancer keeps track of the last timestamp of the last tuple generated for each destination subquery. When no tuple is sent to a destination subquery for a maximum period of time m, then it sends a dummy tuple with an identical timestamp to the last one sent by that load balancer. When the dummy tuple is received by an input merger, it is just used to unblock the input merger processing. If it does not have the smallest timestamp, the input merger will take the tuple with smallest timestamp. Sooner or later, the dummy tuple will be the one with smallest timestamp, in that case, the input merger will just discard it. Thus, periodic generation of dummy tuples in the load balancers avoids blocking the input merger.

[0036]Elasticity is a property of distributed systems that refers to the capacity of growing and shrinking the number of nodes to process the incoming load by using the minimum required resources, that is, the minimum possible number of nodes able to process the incoming load satisfying the quality of service requirements.

Problems solved by technology

None of the currently existing approaches enables to scale out with respect to the incoming stream volume.

This is because the data stream processed by a query or operator query must go through a single node, containing the query or operator, and therefore the system capacity will be limited by the capacity of a single node.

For stream volumes exceeding the processing capacity of a node these systems cannot scale out.

However, this load balancing is studied in the context of distributed query engine that does not parallelize queries, therefore, it does not address the problem of how distribute the load between instances of the same subquery, but across different subqueries.

The problem with this technique is the loss of information that is not permissible for a multitude of applications and also has associated tradeoffs such as precision loss in the result of queries or even consistency loss in the outcome of queries.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0052]FIG. 1 shows a query with Map (M), Filter (F), Join (J) and Aggregate (A) operators. In this query incoming tuples enter through the left operator. The map operator transforms a tuple with the associated transformation function. The filtering operator applies a predicate to the tuple, if it is satisfied, then the tuple is forwarded to the next operator, otherwise, it is discarded. The output of the filter operator is connected with the two inputs of the join operator. That is, each tuple produced by the filter operator is sent to each of the two inputs of the join operator performing a self-join. The join operator applies a predicate to all pairs kept in the two sliding windows (associated to the respective input streams). Each pair that satisfies the predicate is concatenated and generated as an output tuple. The next operator is an aggregate. It aggregates the tuples according a given function or a group-by clause. A tuple is generated periodically with the aggregated value ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A continuous query parallel engine on data streams provides scalability and increases the throughput by the addition of new nodes. The parallel processing can be applied to data stream processing and complex events processing. The continuous query parallel engine receives the query to be deployed and splits the original query into subqueries, obtaining at least one subquery; each subquery is executed in at least in one node. Tuples produced by each operator of each subquery are labeled with timestamps. A load balancer is interposed at the output of each node that executes each one of the instances of the source subquery and an input merger is interposed in each node that executes each one of the instances of a destination subquery. After checks are performed, further load balancers or input managers may be added.

Description

[0001]This application claims benefit of U.S. Ser. No. 61 / 356,353, filed 18 Jun. 2011 and which application is incorporated herein by reference. To the extent appropriate, a claim of priority is made to the above disclosed application.FIELD OF THE INVENTION[0002]The present invention belongs to the data stream processing and event management fields.BACKGROUND OF THE INVENTION[0003]Continuous query processing engines enable processing data streams by queries that process continuously those streams producing results that are updated with the arrival of new data in the data stream. Known continuous query processing engines are Borealis (Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur etintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, Stanley B. Zdonik: The Design of the Borealis Stream Processing Engine. CIDR 2005: 277-289), Aurora (Daniel J. Abadi, Donald Carney, Ugur etintemel, Mitch Cherniack,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(United States)

IPC IPC(8): G06F17/30

CPCG06F9/5066G06F17/30445G06F17/30516G06F9/5088G06F16/24568G06F16/24532

Inventor JIMENEZ PERIS, RICARDOPATINO MARTINEZ, MARTA

Owner UNIV MADRID POLITECNICA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Parallel processing of continuous queries on data streams

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology