Random sampling as a built-in function for database administration and replication

a database and function technology, applied in relational databases, data processing applications, instruments, etc., can solve problems such as unfavorable data quality improvement, and inability to provide exact analysis, so as to reduce the number of system calls, reduce time, and reduce the strain on the computer system

a database and function technology, applied in relational databases, data processing applications, instruments, etc., can solve problems such as unfavorable data quality improvement, and inability to provide exact analysis, so as to reduce the number of system calls, reduce time, and reduce the strain on the computer system

US7028054B2Inactive Publication Date: 2006-04-11IBM CORP

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Random sampling as a built-in function for database administration and replication
  • Random sampling as a built-in function for database administration and replication
  • Random sampling as a built-in function for database administration and replication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038]The capacity of DL / I databases is limited by the maximum size of a data set that can be addressed by a four-byte relative byte address (RBA). Many other databases in use presently suffer from similar size limitations. In current full function databases managed by database management systems such as IMS, multiple data sets are supported. This helps to increase the capacity of the database. One requirement, however, is that all segments of the same type must be in the same data set. As a result, when one data set is full, the database is deemed to be essentially full even if empty space exists in the remaining data sets. As a consequence, methods have been developed to extend the capacity of such databases.

[0039]As shown in FIG. 1, partitioning removes the data set limitation by relieving the restriction that all occurrences of the same segment type must be in the same data set. Partitioning database 10 groups database records into sets of partitions 12 that are treated as a sin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A database management system and method for administration and replication having a built-in random sampling facility for approximation partition analysis on very large databases. The method utilizes a random sampling algorithm that provides results accurate to within a few percentage points for large homogeneous databases. The accuracy is not affected by the size of the database and is determined primarily by the size of the sample. The system and method for approximate partition analysis reduces the time required for an analysis to a fraction of the time required for an exact analysis. The database management system is configured with the random sampling facility built-in thereby enabling even greater efficiency by reducing communication overhead between an analysis program and the database management system to a fraction of the overhead required when sampling is performed by a separate analysis program. The reduction in time thereby permits frequent and timely analyses for replication and administration of database partitions.

Description

CROSS-REFERENCE TO RELATED APPLICATION[0001]This application is related to U.S. application Ser. No. 09 / 897,853, filed together with this application, entitled Partition Boundary Determination Using Random Sampling on Very Large Databases.BACKGROUND OF THE INVENTION[0002]The invention pertains to partition size analysis for very large databases having multiple partitions and, more particularly, to accurate, fast, and scalable characterization and estimation of large populations using a random sampling function that is integrated directly into a database engine.[0003]Databases provide a means to conveniently store and retrieve a wealth of information such as, in the business setting, individual and corporate accounts and, in the business example provide a means to analyze business trends and make other business, educational, and scientific decisions. Accordingly, over the years, typical database populations reach upward of a billion rows and records.[0004]Analysis of these large data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
11 Apr 2006
Publication
US7028054B2
IPC
G06F17/30; G06F12/00
CPC
G06F17/30595; Y10S707/99953; G06F16/284
Inventors
HARPER, JOHN WILLIAM; SLISHMAN, GORDON ROBERT