Abstract:
The emergence of e-application has been creating extremely high volume of data
that reaches to terabyte threshold. Many'organizations are producing data that are doubling
every year. The conventional data management system is costlier in terms of storage space
and processing speed, and sometimes it is unable to handle such huge amount of data. New
algorithms and techniques need to develop to store and manipulate these data. The database
compression can be used for scalable storage and faster data access.
We propose compression based data management system architecture that can be
used to handle terabyte level of relational data. The existing compression schemes e.g.
HIBASE or Three Layer Database Compression Architecture work well for memory
resident data and provide good performance. These are low cost solution for highperformance
data management system but are not scalable to manage terabyte level of data.
We have developed a disk based columnar multi-block vector structure (CMBVS)
that can be used to store relational data in a compressed representation with direct
addressability. Parallel data access can be achieved by distributing the vector structure into
multiple servers to improve the scalability.
The lowest layer of the model is the block structure to store the compressed
representation of data. The next higher level is the vector-structure that relates the block
structure to an attribute of the relational data model. The structures are capable of carrying
out query directly on the compressed form of data. This reduces query time drastically. We
have compared our system with conventional relational DBMS. The experimental results
show that our system is about twenty five times efficient in storage cost and twenty-seven to
seventy-seven times faster in retrieval time performance than that of the conventional
systems.