Supercharge Your Innovation With Domain-Expert AI Agents!

Uncertain data-oriented probability query quality optimization method

A technology for determining data and probabilistic query, applied in digital data processing, special data processing applications, instruments, etc., can solve the problems of data cleaning algorithm without query, high cost, impractical uncertain data, etc., to reduce the quality of calculation query Time, the effect of low time cost

Inactive Publication Date: 2017-06-27
ZHEJIANG UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] As we all know, data cleaning is an effective way to improve quality, but it is also a time-consuming and costly process, especially in a big data environment, it is impractical to clean all uncertain data
At present, scholars at home and abroad have done some work on the modeling and query processing of uncertain data, but these works still have limitations: (1) the query algorithm does not consider the limited user resources; (2) data cleaning There is no effective algorithm for probabilistic Skyline query and probabilistic k-nearest neighbor query in the algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Uncertain data-oriented probability query quality optimization method
  • Uncertain data-oriented probability query quality optimization method
  • Uncertain data-oriented probability query quality optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Now in conjunction with accompanying drawing and concrete implementation technical scheme of the present invention is described further:

[0036] Such as figure 1 Shown, the specific implementation process and working principle of the present invention are as follows:

[0037] Step (1): Query φ(q,S) with a given probability, q is the query object, S is the uncertain data set, build an ASI index for the result object set R in the uncertain data set, and maintain a maximum query result quality and the corresponding cleaning object set; using the ASI index can directly calculate the probability Pr(R) of the result object set R, avoiding traversing all possible worlds, and quickly calculate the expected quality of the query; the ASI index is a hash table, with the result object set R as Key value, each result object set stores its corresponding result tuple set r; for each r, stores the probability value Pr(r), and the probability vector Each item in the probability vect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an uncertain data-oriented probability query quality optimization method. Finite resources are given; the query quality is measured by utilizing a combination entropy-based quality function; and an uncertain object set needed to be cleaned can be determined, so that the purpose of optimal expected query quality is achieved under specified resource limitations. The method can comprise two major parts including query quality calculation and cleaning object optimization. During the query quality calculation, all possible query result object sets are indexed by adopting ASI, so that the traversal of all possible worlds in the quality calculation process is effectively avoided, the probability of the query result sets can be quickly updated, and the quality calculation efficiency is improved. During the cleaning object selection, two heuristic rules including candidate subsets and quality function monotonicity are utilized, and an accurate cleaning object optimization algorithm and two similar cleaning object optimization algorithms are proposed, so that while the query quality is remarkably improved, the cleaning object optimization time is effectively shortened, and the cleaning cost is ensured to be in a given budget range.

Description

technical field [0001] The invention relates to database query processing technology, in particular to a probability query quality optimization method for uncertain data. Background technique [0002] The generation of uncertain data originates from many real-life applications, such as sensor input noise, wireless transmission errors, data errors and omissions in data integration, etc. Therefore, in the field of databases, the processing of uncertain data queries (such as probabilistic Skyline calculation, probabilistic k-nearest neighbor query, probabilistic Top-k query, etc.) has received extensive attention. [0003] In an uncertain database, the query results returned by general probability queries are objects with non-zero result probabilities. The uncertainty of the data set will be propagated to the query results, so it is difficult for users to get the expected accurate query results. Low-quality query results are also difficult to help users make correct decisions...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/217
Inventor 高云君苗晓晔周琳琳陈刚郭素
Owner ZHEJIANG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More