Unlock instant, AI-driven research and patent intelligence for your innovation.
Random sampling from distributed streams
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a random sampling and data technology, applied in the field of optimal random sampling from distributed streams of data, can solve the problems of communication between sites, inability to collect all data at a single site, and inability to process data in a centralized manner
Inactive Publication Date: 2013-03-21
IOWA STATE UNIV RES FOUND +1
View PDF2 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Benefits of technology
The invention is a method, computer program product, and system for distributed sampling on a network with multiple sites and a coordinator. The method includes receiving a data element with a weight from a site, comparing it with a global value stored at the coordinator, and updating or communicating the global value based on the weight. This allows for collaborative decision-making and efficient data collection across multiple sites.
Problems solved by technology
For many data analysis tasks, it is impractical to collect all the data at a single site and process it in a centralized manner.
A challenge is to minimize the communication between the different sites and the coordinator, while providing an accurate answer to queries at the coordinator at all times.
A problem in this setting is to obtain a random sample drawn from the union of all distributed streams.
Other problems on distributed streamprocessing, including the estimation of the number of distinct elements and heavy hitters, use random sampling as a primitive.
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
case i
[0052] v sends a message to the coordinator in epoch i in Process A. In this case, the first time v sends a message to the coordinator in this epoch, v will receive the current value of u , which is smaller than or equal to mi. This communication costs two messages, one in each direction. Henceforth, in this epoch, the number of messages sent in Process A is no more than those sent in Process B. In this epoch, the number of messages transmitted to / from v in Process A is at most twice the number of messages as in Process B, which has at least one transmission from the coordinator to site v.
case ii
[0053] v did not send a message to the coordinator in this epoch, in Process A. In this case, the number of messages sent in this epoch to / from site v in Process A is smaller than in Process B.
[0054]Let ξ denote the total number of epochs.
[0055]Lemma 4. If r≧2,
Eξ≤(log(n / s)logs)+2
[0056]Proof
Letz=(log(n / r)logr).
First, it is noted that in each epoch, u decreases by a factor of at least r. Thus, after (z+l) epochs, u is no more than
1rz+=(rn)1r.
Thus,
[0057]Pr[ξ≥z+]≥Pr[u≤(sn)1r]
[0058]Let Y denote the number of elements (out of n) that have been assigned a weight of
snr
or less. Y is a binomial random variable with expectation
sr.
Note that if
u≤snr,
it must be true that Y>s.
Pr[ξ≥z+]≤Pr[Y≥s]≤Pr[Y≥rE[Y]]≤1r
where Markov's inequality has been used.
[0059]Since ξ takes only positive integral values,
[0060]Let nj denote the total number of elements that arrived in epoch j, thus n=Σj=0ξ−1nj. Let μ denote the total number of messag...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
Described herein are methods, systems, apparatuses and products for random sampling from distributed streams. An aspect provides a method for distributed sampling on a network with a plurality of sites and a coordinator, including: receiving at the coordinator a data element from a site of the plurality of sites, said data element having a weight randomly associated therewith deemed reportable by comparison at the site to a locally stored global value; comparing the weight of the data element received with a global value stored at the coordinator; and performing one of: updating the global value stored at the coordinator to the weight of the data element received; and communicating the global value stored at the coordinator back to the site of the plurality of sites. Other embodiments are disclosed.
Description
FIELD OF THE INVENTION[0001]The subject matter presented herein generally relates to optimal random sampling from distributed streams of data.BACKGROUND[0002]For many data analysis tasks, it is impractical to collect all the data at a single site and process it in a centralized manner. For example, data arrives at multiple network routers at extremely high rates, and queries are often posed on the union of data observed at all the routers. Since the data set is changing, the query results could also be changing continuously with time. This has motivated the continuous, distributed, streaming model. In this model there are k physically distributed sites receiving high-volume local streams of data. These sites talk to a central coordinator that has to continuously respond to queries over the union of all streams observed so far. A challenge is to minimize the communication between the different sites and the coordinator, while providing an accurate answer to queries at the coordinator...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.
Login to View More
Patent Type & Authority Applications(United States)