The volume of information that is available to corporations is rapidly increasing and frequently overwhelming.
Building a
Data Warehouse has its own special challenges (e.g. using common
data model, common business dictionary, etc.) and is a complex endeavor.
However, just having a
Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing.
In all the prior art OLAP servers, the process of storing, indexing and handling MDDB utilize
complex data structures to largely improve the retrieval speed, as part of the querying process, at the cost of slowing down the storing and aggregation.
The query-bounded structure, that must support fast retrieval of queries in a restricting environment of high sparcity and multi-hierarchies, is not the optimal one for fast aggregation.
However, requirements (2) and (3) fundamentally limit MOLAP's capability, because to be effective and to meet end-user requirements, MOLAP databases need a high degree of aggregation.
By contrast, the ROLAP system architecture allows the construction of systems requiring a low degree of aggregation, but such systems are significantly slower than systems based on MOLAP system architecure principles.
The resulting long aggregation times of ROLAP systems impose severe limitations on its volumes and dimensional capabilities.
However, prior art MOLAP systems have limited capabilities to dynamically create data aggregations or to calculate
business metrics that have not been precalculated and stored in the MDDB.
However, the ROLAP architecture, despite its high volume and dimensionality superiority, suffers from several significant drawbacks as compared to MOLAP:
Full aggregation of large data volumes are very
time consuming, otherwise, partial aggregation severely degrades the query response.
SQL is less capable of the sophisticated analytical functionality necessary for OLAP
ROLAP provides limited application functionality
Building a Data Warehouse has its own special challenges (e.g. using common
data model, common business dictionary, etc.) and is a complex endeavor.
However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing.
However, the querying component of RDBMS technology suffers from performance and optimization problems
stemming from the very nature of the relational
data model.
For large multidimensional databases, a naive implementation of these operations involves computational intensive table scans that leads to unacceptable query response times.
For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that typically leads to unacceptable query response times. Moreover, since the fact tables are pre-summarized and aggregated along business dimensions, these tables tend to be very large.
The first performance issue arises from computationally intensive table scans that are performed by a naive implementation of data joining.
However, these indexing schemes suffer from various performance issues as follows:
Since the tables in the
star schema design typically contain the entire hierarchy of attributes (e.g. in a PERIOD dimension, this hierarchy could be day>week>month>quarter>year), a multipart key of day, week, month, quarter, year has to be created; thus, multiple meta-data definitions are required (one of each key component) to define a single relationship; this adds to the design complexity, and sluggishness in performance.
Addition or deletion of levels in the hierarchy will require physical modification of the
fact table, which is
time consuming process that limits flexibility.
Carrying all the segments of the compound dimensional key in the
fact table increases the size of the index, thus impacting both performance and
scalability.
Another performance issue arises from dimension tables that contain multiple hierarchies.
Every retrieval from
fact table that stores details and aggregates must use the indicator to obtain the correct result, which impacts performance.
Notably, the
snowflake schema is even more complicated than a
star schema, and often requires multiple
SQL statements to get the results that are required.
Another performance issue arises from the pairwise join problem.
Traditional RDBMS engines are not design for the rich set of complex queries that are issued against a
star schema.
The need to retrieve related information from several tables in a single query--"join
processing"--is severely limited.
Unfortunately, because the number of combinations to be evaluated grows exponentially with the number of tables being joined, the problem of selecting the best order of pairwise joins rarely can be solved in a reasonable amount of time.
Moreover, because the number of combinations is often too large, optimizers limit the selection on the basis of a criterion of directly related tables.
Unfortunately, the fact table is the very largest table in the query, so this strategy leads to selecting a pairwise join order that generates a very large intermediate
result set, severely affecting query performance.
Unfortunately, parallelism can only reduce, not eliminate, the performance degradation issues related to the star schema.
The more difficult are analytical calculations, the aggregation of Boolean and comparative operators.
summary tables require that database administrators anticipate the data aggregation operations that users will require; this is a difficult task in large multi-dimensional databases (for example, in data warehouses and
data mining systems), where users always need to query in new ways looking for new information and patterns.
summary tables do not provide a mechanism that allows efficient
drill down to view the
raw data that makes up the summary table--typically a table scan of one or more large tables is required.
there is a heavy
time overhead because the vast majority of the generated information remains unvisited.
the degree of viable parallelism is limited because the subsequent levels of summary tables must be performed in pipeline, due to their hierarchies.
for very large databases, this option is not valid because of time and storage space.
Thus, in large multi-dimensional databases, such dynamic aggregation may lead to unacceptable query response times.
In view of the problems associated with joining and aggregation within RDBMS, prior art ROLAP systems have suffered from essentially the same shortcomings and drawbacks of their underlying RDBMS.
Thus, users of the DBMS cannot directly view these results.
Such requirements can present security issues, highly undesirable for system administration.
Satisfying such requirements is a costly and logistically cumbersome process.
As a result, the widespread applicability of MOLAP systems has been limited.