Random sampling as a built-in function for database administration and replication

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a database and function technology, applied in relational databases, data processing applications, instruments, etc., can solve problems such as unfavorable data quality improvement, and inability to provide exact analysis, so as to reduce the number of system calls, reduce time, and reduce the strain on the computer system

Inactive Publication Date: 2006-04-11

IBM CORP

View PDF15 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

[0029]One benefit obtained from the present invention as a result of providing a built-in sampling facility is the reduction in the number of system calls required to perform an approximation partition analysis.

[0030]Another benefit obtained from the present invention is the reduction in time required to perform an approximation partition analysis compared to the time required for an exact partition analysis.

[0031]Still another benefit obtained from the present invention is that approximation partition analyses is performed frequently without straining or otherwise compromising computer system resources.

[0032]Yet another benefit obtained from the present invention is an improved accuracy of the analyses, particularly for homogeneous database populations.

[0033]Yet another benefit obtained from the present invention is that a random sample of predetermined size is obtained without prior knowledge of the number of records in the sampled database.

Problems solved by technology

Analysis of these large databases for administration and replication purposes typically involves processes which are very input / output intensive, as numerous queries must be performed by an analysis program across a vast number of records.

It is typically not possible to provide an exact analysis without first removing a database from online for an extended period of time.

The method and system provided are unique in that a random sample is selected of predetermined known size, but uniformly distributed across the entire database, from a database of known or unknown size while reading only a fraction of the records in the database without the requirement of indexing the entire database which, as indicated above, is time consuming and provides results having an unnecessary degree of precision.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038]The capacity of DL / I databases is limited by the maximum size of a data set that can be addressed by a four-byte relative byte address (RBA). Many other databases in use presently suffer from similar size limitations. In current full function databases managed by database management systems such as IMS, multiple data sets are supported. This helps to increase the capacity of the database. One requirement, however, is that all segments of the same type must be in the same data set. As a result, when one data set is full, the database is deemed to be essentially full even if empty space exists in the remaining data sets. As a consequence, methods have been developed to extend the capacity of such databases.

[0039]As shown in FIG. 1, partitioning removes the data set limitation by relieving the restriction that all occurrences of the same segment type must be in the same data set. Partitioning database 10 groups database records into sets of partitions 12 that are treated as a sin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A database management system and method for administration and replication having a built-in random sampling facility for approximation partition analysis on very large databases. The method utilizes a random sampling algorithm that provides results accurate to within a few percentage points for large homogeneous databases. The accuracy is not affected by the size of the database and is determined primarily by the size of the sample. The system and method for approximate partition analysis reduces the time required for an analysis to a fraction of the time required for an exact analysis. The database management system is configured with the random sampling facility built-in thereby enabling even greater efficiency by reducing communication overhead between an analysis program and the database management system to a fraction of the overhead required when sampling is performed by a separate analysis program. The reduction in time thereby permits frequent and timely analyses for replication and administration of database partitions.

Description

CROSS-REFERENCE TO RELATED APPLICATION[0001]This application is related to U.S. application Ser. No. 09 / 897,853, filed together with this application, entitled Partition Boundary Determination Using Random Sampling on Very Large Databases.BACKGROUND OF THE INVENTION[0002]The invention pertains to partition size analysis for very large databases having multiple partitions and, more particularly, to accurate, fast, and scalable characterization and estimation of large populations using a random sampling function that is integrated directly into a database engine.[0003]Databases provide a means to conveniently store and retrieve a wealth of information such as, in the business setting, individual and corporate accounts and, in the business example provide a means to analyze business trends and make other business, educational, and scientific decisions. Accordingly, over the years, typical database populations reach upward of a billion rows and records.[0004]Analysis of these large data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(United States)

IPC IPC(8): G06F17/30G06F12/00

CPCG06F17/30595Y10S707/99953G06F16/284

Inventor HARPER, JOHN WILLIAMSLISHMAN, GORDON ROBERT

Owner IBM CORP

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Random sampling as a built-in function for database administration and replication

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology