Abstract:
Automated evaluation of a writer’s writing abilities involves the utilization of computer programs to assess written work and provide feedback to the writer. This process is commonly used in educational settings to grade essays and give students a better understanding of their writing abilities. The emergence of automated writing evaluation has transformed the traditional time consuming strenuous process of assessing written work, significantly reducing time and effort. Existing works score specific dimensions of an essay, grammar, word choice, coherence, and others to provide targeted feedback. Moreover, the majority of these systems prioritize evaluating the entire writing or essay as a cohesive piece rather than focusing on individual sentences. Sentence-level evaluation can provide more detailed and specific feedback to the writers, to improve their writing skills over time. This approach can help identify precise areas for improvement and provide targeted feedback. This work proposes a mechanism for evaluating essays or writing at the sentence level. A publicly available essay dataset have been collected and then evaluated manually by experts to prepare a dataset with sentence-level scores. The primary goal of this manual assessment is to enrich our dataset with precise and meaningful sentence-level scores, ensuring a comprehensive evaluation of each sentence’s quality and composition. We have proposed two different models to evaluate essays. The first one uses pre-trained models like BigBird, Longformer, and DeBERTa, which are already good at understanding language. The second model is a neural network architecture based on multi-head-attention mechanisms. In the proposed evaluation process, both models perform an essential task: they calculate a holistic score for each individual sentence within an essay. This holistic score reflects the overall quality and effectiveness of each sentence in contributing to the essay’s overall coherence and meaning. The final score for each essay is obtained by summing up the marks for each sentence.
Among the pre-trained models used for essay-level scoring, the BigBird model outperforms the others, achieving an impressive MSE (Mean Squared Error) of 0.032. In contrast, the second model yields an MSE score of 0.0909. Experimental results show that the proposed model can score essays with high accuracy and low average error. This accuracy and low error rate emphasize that our proposed model is strong and dependable when it assesses essays. It shows that the model can provide helpful and reliable feedback to writers.