Efficient information retrieval system for Bangla text database

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Latiful Hoque, Dr. Abu Sayed Md.
dc.contributor.author	Amir Sharif, Mohammad
dc.date.accessioned	2015-12-14T09:59:40Z
dc.date.available	2015-12-14T09:59:40Z
dc.date.issued	2006-11
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1521
dc.description.abstract	The amount of information available in electronic form is growmg exponentially, making it increasingly difficult to find the desired information. Information retrieval is primarily conccrned with the storage and retrieval of information. Thus, along with the growth of the World Wide Web, information retrieval systems gain importance since thcy are ollen the only way to find the few documcnts actually relevant to a specific query in the vast quantities of text available. Although information retrieval systems mainly deal with natural language, linguistic methods are rarely used. The mechanical cutting otT of inflectional and derivational suffixes to better match index terms to query terms is called stemming. Since most research on information retrieval is done for English, which has a relatively weak morphology, this is not regarded as problematic for stemming. Stemming and more linguistically motivated methods show a positive impact on retrieval performance for language such as Dutch, German, Italian, or BangIa, which are morphologically richer than English. There is so much variation of words in Bangia having similar meaning. So stemming is required to find the root of the words having similar meaning by doing morphological analysis. IR system's performance is affected due to synonyms. This problem is even worse in Bangia than English. Existing Bangia text database contains both unicode and non-unicode texts. It is difficult to search uniformly the database with both the types of tcxt. We have devcloped an efficient information retrieval system with morphological analysis to stem the word. The experimental results show that up to 20% better precision with 14% better recall can be achieved for Bangia by using around 150 non-intuitive stemming rules. We have developed a dictionary-based synonym handling technique to store the synonyms and access the database with the consideration of the synonyms. We have developed a technique to access the database irrespective of the type of encoding of thc text.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering, BUET	en_US
dc.subject	Information retrieval	en_US
dc.title	Efficient information retrieval system for Bangla text database	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	040305031	en_US
dc.identifier.accessionNumber	103109
dc.contributor.callno	025.524/AMI/2006	en_US