Abstract:
Machine translation from Bangia to other languages is a promising field, but there
are limited works in this area. Syntax-based machine translation is a suitable
technique for machine translation system from Bangia to other languages, as Bangia
grammar is nicely structured. This translation technique has two stages - parsing and
generation. Parsing is the main challenge of a syntax-based machine translator. This
thesis includes design of non-ambiguous predictive Bangia grammar, which is used
to develop predictive parser with error recovery capability.
Analyzing previous works on Bangia grammar, it can be summarized that,
previously designed grammars were non-comprehensive and ambiguous. Because of
ambiguity, it does not fall into the category of LL(l) grammar. Predictive parser can
not be developed without LL(I) grammar. Non-predictive parser uses backtracking
technique, which takes exponential runtime. This is quite impractical for machine
translation system, which generally deals with a large amount of data. Error
recovery technique was never introduced in Bangia parsing technique. Unlike
compiler, grammar of a natural language reflects only common patterns of
sentences. To design a grammar to reflect all patterns of sentences, cause to grow
the complexity of the grammar exponentially, because a single sentence can be
written in different ways correctly. Without error recovery feature, parsing process
stops when an' error is detected in input sentence. Lack of error recoverability is a
big hindrance to develop successful Bangia parser. Moreover, handling of nondictionary
words is a big challenge, which was not solved previously.
In this thesis, ambiguity is eliminated from previous grammar. Therefore, nonambiguous
predictive grammar is designed. Additionally, this grammar includes a
nice mechanism to handle non-dictionary words. The grammar has been enhanced
including some common patterns of sentences, specially including additional uses of
conjunctive, number handling etc. A top-down predictive parser is designed using
non-ambiguous predictive grammar. Predictive nature of the grammar ensures linear
runtime of parsing process. Therefore, difficulty of parsing due to exponential
runtime is over and parsing a massive volume of data is not a problem. Error
recovery feature in Bangia parsing process has added a new dimension. This feature
allows the parser to continue parsing after detection of error. Therefore, previously
found problem of halting of parsing process is solved. Error recovery routine of the
parser skips the error and parsing process again synchronizes with the rest of correct
portion of input sentence, if error exists in that sentence. To make the error recovery
process efficient, heuristic is applied. So, error may not be recovered correctly in all
cases. But most of the cases error recovery is correct and most importantly parsing
never stops due to error.
This thesis also includes some supporting modules of Bangia predictive parser, like
structure of lexicon and strategy of lexical analysis for input Bangia sentences,
which includes dynamic tagging of multiple meaning words. A simulation program
justifies the correctness of grammar, parsing and error recovery mechanism.