DSpace Repository

Efficient information retrieval system for Bangla text database

Show simple item record

dc.contributor.advisor Latiful Hoque, Dr. Abu Sayed Md.
dc.contributor.author Amir Sharif, Mohammad
dc.date.accessioned 2015-12-14T09:59:40Z
dc.date.available 2015-12-14T09:59:40Z
dc.date.issued 2006-11
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1521
dc.description.abstract The amount of information available in electronic form is growmg exponentially, making it increasingly difficult to find the desired information. Information retrieval is primarily conccrned with the storage and retrieval of information. Thus, along with the growth of the World Wide Web, information retrieval systems gain importance since thcy are ollen the only way to find the few documcnts actually relevant to a specific query in the vast quantities of text available. Although information retrieval systems mainly deal with natural language, linguistic methods are rarely used. The mechanical cutting otT of inflectional and derivational suffixes to better match index terms to query terms is called stemming. Since most research on information retrieval is done for English, which has a relatively weak morphology, this is not regarded as problematic for stemming. Stemming and more linguistically motivated methods show a positive impact on retrieval performance for language such as Dutch, German, Italian, or BangIa, which are morphologically richer than English. There is so much variation of words in Bangia having similar meaning. So stemming is required to find the root of the words having similar meaning by doing morphological analysis. IR system's performance is affected due to synonyms. This problem is even worse in Bangia than English. Existing Bangia text database contains both unicode and non-unicode texts. It is difficult to search uniformly the database with both the types of tcxt. We have devcloped an efficient information retrieval system with morphological analysis to stem the word. The experimental results show that up to 20% better precision with 14% better recall can be achieved for Bangia by using around 150 non-intuitive stemming rules. We have developed a dictionary-based synonym handling technique to store the synonyms and access the database with the consideration of the synonyms. We have developed a technique to access the database irrespective of the type of encoding of thc text. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering, BUET en_US
dc.subject Information retrieval en_US
dc.title Efficient information retrieval system for Bangla text database en_US
dc.type Thesis-MSc en_US
dc.contributor.id 040305031 en_US
dc.identifier.accessionNumber 103109
dc.contributor.callno 025.524/AMI/2006 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account