A Comparative Error Analysis of Iranian EFL Learners’ Writing Compositions by Human Evaluators Vs. Perplexity AI Platform

Ostad, Omid; Heidari, Hadi

doi:10.22034/jespp.2025.546767.1021

A Comparative Error Analysis of Iranian EFL Learners’ Writing Compositions by Human Evaluators Vs. Perplexity AI Platform

Document Type : Original Article

Authors

Omid Ostad ¹

Hadi Heidari ²

¹ Department of English Language and Literature, Faculty of Persian Literature and Foreign Languages, Allameh Tabataba’i University, Tehran, Iran

² Department of English language, Faculty of Humanities, Imam Khomeini International University, Qazvin, Iran

10.22034/jespp.2025.546767.1021

Abstract

Abstract
The advent of the AI as a supplementary tool for error analysis and how it is different from human error analysis seems to be an underexplored and enchanting research area. This study sought to examine the types of errors found in the written compositions of 16 intermediate-level Iranian students— male and female learners included—selected through convenience sampling whose data was gleaned from two language academies in Rasht. All participants were using American English File 3 (2^nd Edition) serving as their main course book, with the researchers also being their instructor. The students were tasked with writing a response to a letter, based on a model provided in their textbook. A total of 16 writing samples were collected and analyzed using both qualitative and quantitative methods, guided by Keshavarz’s (2013) linguistic error classification framework. The evaluation was conducted by a human rater and the AI tool Perplexity, which was specifically prompted to identify errors according to the same classification system. The results revealed a range of error types with varying frequencies. Morphosyntactic errors tended to be the most common, followed by orthographic, lexicosemantic, and phonological errors, respectively. Moreover, Inter-rater reliability was calculated via Cohen’s Kappa, which indicated substantial level of agreement between human and AI raters (κ = 0.78, p < 0.001). Teachers can combine the precision and consistency of AI with the subjective and interpretive depth of human assessment to provide more responsive feedback that supports linguistic accuracy and learner agency. Ultimately, this study advocates for integrative approaches to language assessment to use technological development under the supervision of the human agents. Future research can utilize different learner characteristics (e.g., different proficiency and cultural levels, with more varied written assignments) in different cultures, compared with other AI tools, to study the longitudinal consistency of the AI and human raters under different situations.

Keywords

Artificial Intelligence

Error Analysis

Human Evaluators

L1 Interference

Morphosyntactic Errors

Perplexity

Writing Compositions