Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements

a database management system and database technology, applied in multi-dimensional databases, instruments, manufacturing tools, etc., can solve the problems of inability to provide organizations with data warehouse, inability to meet the needs of users, and inability to meet the requirements of users, etc., to maximize the query performance of large data volumes, increase system performance, and user flexibility and ease of us

Inactive Publication Date: 2002-12-19
MEC MANAGEMENT LLC +1
View PDF8 Cites 61 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0022] Carrying all the segments of the compound dimensional key in the fact table increases the size of the index, thus impacting both performance and scalability.
[0026] This is an optimization strategy, typically referred to as Cartesian Joins, that lessens the performance impact of the pairwise join problem by allowing joining of unrelated tables. The join to the fact table, which is the largest one, is deferred until the very end, thus reducing the size of intermediate result sets. In a join of two unrelated tables every combination of the two tables' rows is produced, a Cartesian product. Such a Cartesian product improves query performance. However, this strategy is viable only if the Cartesian product of dimension rows selected is much smaller than the number of rows in the fact table. The multiplicative nature of the Cartesian join makes the optimization helpful only for relatively small databases.
[0027] In addition, systems that exploit hardware and software parallelism have been developed that lessens the performance issues set forth above. Parallelism can help reduce the execution time of a single query (speed-up), or handle additional work without degrading execution time (scale-up).). For example, Red Brick.TM. has developed STARjoin.TM. technology that provides high speed, parallelizable multi-table joins in a single pass, thus allowing more than two tables can be joined in a single operation. The core technology is an innovative approach to indexing that accelerates multiple joins. Unfortunately, parallelism can only reduce, not eliminate, the performance degradation issues related to the star schema.
[0039] Note that in the event that the aggregated data does not exist in the summary tables, table join operations and aggregation operations are performed over the raw facts in order to generate such aggregated data. This is typically referred to as on-the-fly aggregation. In such instances, aggregation indexing is used to mitigate the performance of multiple data joins associated with dynamic aggregation of the raw data Thus, in large multi-dimensional databases, such dynamic aggregation may lead to unacceptable query response times.
[0043] Thus, there is a great need in the art for an improved mechanism for joining and aggregating data elements within a relational database management system, and for integrating the improved relational database management system into informational database systems (including the data warehouse and OLAP domains), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.

Problems solved by technology

Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor.
However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing.
However, the querying component of RDBMS technology suffers from performance and optimization problems stemming from the very nature of the relational data model.
For large multidimensional databases, a naive implementation of these operations involves computational intensive table scans that leads to unacceptable query response times.
For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that typically leads to unacceptable query response times. Moreover, since the fact tables are pre-summarized and aggregated along business dimensions, these tables tend to be very large.
The first performance issue arises from computationally intensive table scans that are performed by a naive implementation of data joining.
However, these indexing schemes suffer from various performance issues as follows:
Since the tables in the star schema design typically contain the entire hierarchy of attributes (e.g. in a PERIOD dimension, this hierarchy could be day>week>month>quarter>year), a multipart key of day, week, month, quarter, year has to be created; thus, multiple meta-data definitions are required (one of each key component) to define a single relationship; this adds to the design complexity, and sluggishness in performance.
Addition or deletion of levels in the hierarchy will require physical modification of the fact table, which is time consuming process that limits flexibility.
Carrying all the segments of the compound dimensional key in the fact table increases the size of the index, thus impacting both performance and scalability.
Another performance issue arises from dimension tables that contain multiple hierarchies.
Every retrieval from fact table that stores details and aggregates must use the indicator to obtain the correct result, which impacts performance.
Notably, the snowflake schema is even more complicated than a star schema, and often requires multiple SQL statements to get the results that are required.
Another performance issue arises from the pairwise join problem.
Traditional RDBMS engines are not design for the rich set of complex queries that are issued against a star schema.
The need to retrieve related information from several tables in a single query--"join processing"--is severely limited.
Unfortunately, because the number of combinations to be evaluated grows exponentially with the number of tables being joined, the problem of selecting the best order of pairwise joins rarely can be solved in a reasonable amount of time.
Moreover, because the number of combinations is often too large, optimizers limit the selection on the basis of a criterion of directly related tables.
Unfortunately, the fact table is the very largest table in the query, so this strategy leads to selecting a pairwise join order that generates a very large intermediate result set, severely affecting query performance.
Unfortunately, parallelism can only reduce, not eliminate, the performance degradation issues related to the star schema.
The more difficult are analytical calculations, the aggregation of Boolean and comparative operators.
summary tables require that database administrators anticipate the data aggregation operations that users will require; this is a difficult task in large multi-dimensional databases (for example, in data warehouses and data mining systems), where users always need to query in new ways looking for new information and patterns.
summary tables do not provide a mechanism that allows efficient drill down to view the raw data that makes up the summary table--typically a table scan of one or more large tables is required.
there is a heavy time overhead because the vast majority of the generated information remains unvisited.
the degree of viable parallelism is limited because the subsequent levels of summary tables must be performed in pipeline, due to their hierarchies.
for very large databases, this option is not valid because of time and storage space.
In such instances, aggregation indexing is used to mitigate the performance of multiple data joins associated with dynamic aggregation of the raw data Thus, in large multi-dimensional databases, such dynamic aggregation may lead to unacceptable query response times.
In view of the problems associated with joining and aggregation within RDBMS, prior art ROLAP systems have suffered from essentially the same shortcomings and drawbacks of their underlying RDBMS.
In summary, such shortcomings and drawbacks stem from the fact that there is unidirectional data flow from the RDBMS to the MOLAP system.
Thus, users of the RDBMS cannot directly view these results.
Such requirements can present security issues, highly undesirable for system administration.
Satisfying such requirements is a costly and logistically cumbersome process.
As a result, the widespread applicability of MOLAP systems has been limited.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements
  • Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements
  • Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] Referring now to FIG. 6 through FIG. 13, the preferred embodiments of the method and system of the present invention will be now described in great detail herein below.

[0080] Through this document, the term "aggregation" and "pre-aggregation" shall be understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc. It shall be understood that pre-aggregation operations occur asynchronously with respect to the traditional query processing operations. Moreover, the term "atomic data" shall be understood to refer to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, atomic data may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch.

[0081] In general, the improved RDBMS system of the present invention excels in performing two distinct func...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

PropertyMeasurementUnit
dimensionsaaaaaaaaaa
data structuresaaaaaaaaaa
dimensionaaaaaaaaaa
Login to view more

Abstract

Improved method of and apparatus for joining and aggregating data elements integrated within a relational database management system (RDBMS) using a non-relational multi-dimensional data structure (MDD). The improved RDBMS system of the present invention can be used to realize achieving a significant increase in system performance (e.g. deceased access / search time), user flexibility and ease of use. The improved RDBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system or the like.

Description

RELATED CASES[0001] This is a Continuation-in-part of: copending application Ser. No. 09 / 514,611 entitled "Stand-Alone Cartridge-Style Data Aggregation Server and Method of and System for Managing Multi-Dimensional Databases using the Same" filed Feb. 28, 2000, and copending application Ser. No. 09 / 368,241 entitled "Method Of And System For Managing Multi-Dimensional Databases Using Modular-Arithmetic Based Address Data Mapping Processes" filed Aug. 4, 1999; said Applications being commonly owned by HyperRoll Israel, Limited, herein incorporated by reference in their entirety.[0002] 1. Field of Invention[0003] The present invention relates generally to multi-dimensional relational databases and, more specifically to mechanisms for aggregating data elements in a multi-dimensional relational database system and for processing queries on such aggregated data elements, and also to informational database systems that utilize multi-dimensional relational databases and such aggregation / que...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): C03B37/027G06F17/30
CPCC03B37/02718C03B2203/36C03B2205/42Y10S707/99933G06F17/30592Y10S707/99943Y10S707/99935C03B2205/55G06F16/283
Inventor BAKALASH, REUVENSHAKED, GUYCASPI, JOSEPH
Owner MEC MANAGEMENT LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products