Abstract:
Efficient storage and query processing of data spanning multiple natural languages are of
crucial importance in todays globalized world. As Internet has become the primary medium
for information access and commerce, the multilingual data management can be treated as a
vital issue for the availability of information in the native language of the Internet users.
The necessities of multilingual applications include better searching and browsing
capabilities in languages other than English, accessing information stored in different
languages, accelerating globalization of businesses and implementing e-Commerce and e-
Governance modules etc.
While existing database systems provide some means of storing and querying multilingual
data, they suffer from redundancy proportional to the number of language support. In our
research, we propose a multilingual data management system that stores data in information
theoretic way in encoded form with minimum redundancy. Multilingual data has been
stored in language independent way, which is an easier method for database evolution.
Query operation can be performed from the encoded data only with the help of a translator
and the result is obtained by decompressing it using the corresponding language dictionaries
for text data or without dictionary for other data. Schema evolution is simple and easier to
maintain database consistency, which is difficult in the existing systems. Query
performance is also significantly faster.
Our algorithms for handling multilingual data have been evaluated by both syntactic data
generated by a data generation program and real data sets. We have compared the
performance of our system with the existing systems. From the experiment it is found that
the proposed approach is better in terms of both storage and query processing speed than the
existing systems.