Modified inverted files and algorithms for phrase query and not query

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Modified inverted files and algorithms for phrase query and not query

Paul, Tuhin

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/1559

Date: 2009-12

Abstract:

Inverted files, equivalent to database indices, are used to speed up the search of both Hyper Text Markup Language (HTML) and eXtensible Markup Language (XML) files in the web. Searching XML files differs from that ofHTML in two ways: inverted files for XML need to be compressed because of their large size and the query evaluation against XML files requires keyword searching both in the structure and in the values. XML queries are often composed of multiple keywords with logical relations. XML queries with conjunction, disjunction, ancestor-descendant, and preceding-following relations among the multiple keywords have already been evaluated successfully. Multiple keywords often appear in the XML queries as a phrase. Phrase Query in a single XML document has already been evaluated. However, the method to evaluate phrase query in a large or small collection of XML documents does not exist. Additionally, a special type of query where keywords or phrases must not be present in the evaluated XML documents is alsoTequired in many applications. As per our study, the method to evaluate this NOT queries does not exist either. XML document retrieval will not be complete without evaluating these two important types of queries. New solutions are required to process both phrase and NOT queries efficiently. In this thesis, we introduce the methods to evaluate both phrase and NOT queries proposing necessary changes in the inverted file structure and query processing algorithms. We have used pull parser to parse the XML documents. We have developed a prototype query processor which is capable of creating inverted files and evaluating all types of queries including phrase and NOT queries. Our experimental results using this prototype query processor show the effectiveness of our proposed query evaluation methods.

Show full item record

Files in this item

Name: Full Thesis .pdf

Size: 1.131Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Modified inverted files and algorithms for phrase query and not query

Modified inverted files and algorithms for phrase query and not query

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account