Unlock instant, AI-driven research and patent intelligence for your innovation.

A Multidimensional Dynamic Sampling Method for Approximate Query in Cloud Computing Environment

A cloud computing environment and dynamic sampling technology, applied in computing, special data processing applications, instruments, etc., can solve the problem of inaccurate estimation of small groups

Active Publication Date: 2021-09-28
BEIJING INST OF CLOTHING TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method provided by the invention effectively solves the problem of inaccurate estimation of small groups caused by data skew in approximate queries, and reduces sampling cost under the limitation of limited sample storage space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multidimensional Dynamic Sampling Method for Approximate Query in Cloud Computing Environment
  • A Multidimensional Dynamic Sampling Method for Approximate Query in Cloud Computing Environment
  • A Multidimensional Dynamic Sampling Method for Approximate Query in Cloud Computing Environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described below with reference to the examples and description.

[0024] A multidimensional dynamic sampling method for approximation in a cloud computing environment comprising the following steps: 1) Dynamic sampling system includes an offline processing phase for creating a layered sample and an online processing phase for dynamically selecting a sample; 2) Set the load column parsing module, data feature analysis module, overlay index calculation module, hierarchical column set determination module, and hierarchical sample data creation module; 3) The load column set parsing module analyzes the load query statement, extract A packet column set for each query statement, calculate the number of times each column set, and analyze the relationship between columns, output the result to the data feature analysis module; 4) Data Feature Analysis Module Start a MapReduce Job Scan Original The data set, and outputs the data distribution resu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A multi-dimensional dynamic sampling method for approximate query in a cloud computing environment, comprising the following steps: the dynamic sampling system includes an offline processing stage for creating hierarchical samples and an online processing stage for dynamically selecting samples; in the offline processing stage, The load column set analysis module analyzes the load query statement; the data feature analysis module analyzes the data characteristics; the coverage index calculation module covers the total index; the stratified column set determination module selects the stratified column set for creating stratified samples; The stratified sample data creation module creates stratified samples; in the online processing stage, the query analysis module parses the user query statement; the sample selection module selects the stratified sample data with the smallest sampling cost; the sample size determination module determines the The sample size drawn. The invention effectively solves the problem of inaccurate estimation of small groups caused by data skew in approximate query, and reduces sampling cost under the limitation of limited sample storage space.

Description

Technical field [0001] The present invention relates to a data sampling method for approximate queries, particularly a dynamic sampling method of multi-query load in a cloud computing environment. Background technique [0002] The cloud computing environment provides a high scalability and cost-effective manner to become a mainstream platform for managing big data. However, for large data, even if the real-time processing and speed requirements of the user interaction can be achieved even in the cloud computing environment. For mid-range query and exploratory data analysis applications, it is more meaningful to get a full precise result with a large amount of time and computing resources to get a full precise result. Approximate query processing technology is based on sample data to estimate the results of the query, which greatly reduces the execution time of query, which is of great significance for large data analysis. [0003] Approximate query processing technology based on ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2458
CPCG06F16/2462
Inventor 史英杰刘怡郭飞刘昊
Owner BEIJING INST OF CLOTHING TECH