Abstract:
Data warehousing is a key technology in everyday activity, which usually contains
historical data derived from transaction data, but it can include data from other sources. The
objective of a data warehouse is to provide analysts and managers with strategic information
underlying the business to consolidate data from several sources. Unfortunately, the
emergence of e-application has been creating extremely high volume of data that reaches to
.terabyte threshold. The conventional data warehouse management system is costlier in
terms of storage space and processing speed, and sometimes it is unable to handle such huge
amount of data. As a result, queries and analyses are becoming more complex and time
consuming. Therefore, there is a crucial need for the new algorithms and techniques to store
and manipulate these data.
Parallel and distributed data warehouse architectures have been evolved to support
online queries on massive data in a short time. The database compression can be used for
scalable storage and faster data access. In this thesis, we have presented a compressionbased
distributed data warehouse architecture for storage of warehouse data, and support
online queries efficiently. We have achieved a factor of 25-30 compression compared to
conventional SQL server data warehouse.
The main computational component of data warehouse is the generation and
querying on the data cube. Our algorithm generates data cube directl/from the compressed
form of data in parallel. The reduction in the size of data cube is a factor of 30-45 compared
to existing methods. The response time has also been significantly improved. These
improvements are achieved by eliminating the suffix and prefix redundancy, virtual nature
of the data cube, direct addressability of compressed form of data and parallel computation.
Experimental evaluation shows the improved performance over the existing systems.