Abstract:
Loss-less data compression is potentially attractive in database application for storage
reduction and performance improvement. The existing compression architectures work
well for small memory resident database. Some other techniques use disk-based
compression and therefore, can suppo11 large database. But all these systems can execute
a limited number of queries. Moreover, they cannot perform queries based on multiple
tables. We have developed a disk based compression architecture that uses dictionary
based compression. Each column is stored separately in compressed form. String data are
compressed and numeric data mayor may not be compressed based on the discretion of
the database designer. We have compared our system with widely used Microsoft SQL
-Server. The experimental result shows that the proposed system requires 10 to 20 times
less space. As the system is column oriented, schema evolution is easy. We have also
defined a number of query operators on compressed database. We have implemented
natural join, selection with range predicate, set operations and all aggregation functions.
These complex queries have not been explored in existing compression based systems.
Other than selection queries our system outperforms Microsoft SQL Server with respect
to query time. The performance of selection queries could be improved by introducing
indices. Also the system is appropriate for parallel computation by distributing the
compressed columns to separate processors.