dc.description.abstract |
The amount of information available in electronic form is growmg exponentially,
making it increasingly difficult to find the desired information. Information retrieval is
primarily conccrned with the storage and retrieval of information. Thus, along with the
growth of the World Wide Web, information retrieval systems gain importance since
thcy are ollen the only way to find the few documcnts actually relevant to a specific
query in the vast quantities of text available. Although information retrieval systems
mainly deal with natural language, linguistic methods are rarely used. The mechanical
cutting otT of inflectional and derivational suffixes to better match index terms to query
terms is called stemming. Since most research on information retrieval is done for
English, which has a relatively weak morphology, this is not regarded as problematic for
stemming. Stemming and more linguistically motivated methods show a positive impact
on retrieval performance for language such as Dutch, German, Italian, or BangIa, which
are morphologically richer than English.
There is so much variation of words in Bangia having similar meaning. So stemming is
required to find the root of the words having similar meaning by doing morphological
analysis. IR system's performance is affected due to synonyms. This problem is even
worse in Bangia than English. Existing Bangia text database contains both unicode and
non-unicode texts. It is difficult to search uniformly the database with both the types of
tcxt.
We have devcloped an efficient information retrieval system with morphological
analysis to stem the word. The experimental results show that up to 20% better precision
with 14% better recall can be achieved for Bangia by using around 150 non-intuitive
stemming rules. We have developed a dictionary-based synonym handling technique to
store the synonyms and access the database with the consideration of the synonyms. We
have developed a technique to access the database irrespective of the type of encoding of
thc text. |
en_US |