International Journal of
Information and Education Technology

Editor-In-Chief: Prof. Jon-Chao Hong
Frequency: Monthly
ISSN: 2010-3689 (Online)
E-mali: editor@ijiet.org
Publisher: IACSIT Press
 

OPEN ACCESS
3.2
CiteScore

IJIET 2026 Vol.16(4): 894-901
doi: 10.18178/ijiet.2026.16.4.2561

Evaluating the Evaluators: Metrics for Automated Essay Feedback Generation

Maryam Berijanian1,*, Christopher G. Shaltry2, and Dirk Colbry1
1. Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, USA
2. College of Human Medicine and College of Osteopathic Medicine, Michigan State University, East Lansing, MI, USA
Email: berijani@msu.edu (M.B.); shaltryc@msu.edu (C.G.S.); colbrydi@msu.edu (D.C.)
*Corresponding author

Manuscript received August 14, 2025; revised October 9, 2025; accepted November 26, 2025; published April 10, 2026

Abstract—Automated Essay Scoring (AES) systems have improved with advances in Natural Language Processing (NLP), but they often prioritize grade prediction over qualitative feedback, which is crucial for student learning. This study evaluates the capabilities of Large Language Models (LLMs) in generating detailed, context-specific feedback, with the goal of improving student understanding. More importantly, it focuses on validating the efficacy of widely used NLP metrics for assessing feedback quality, analyzing their alignment with human judgments. The findings highlight both the strengths and limitations of these metrics in evaluating qualitative feedback within AES contexts. By comparing LLM performance under few-shot learning and fine-tuning conditions, the study identifies both promising directions and persistent challenges in automated feedback generation. Overall, the results emphasize that while LLMs can enhance feedback generation, current metrics remain inadequate for reliably guiding such improvement, underscoring the need for more robust evaluation frameworks.

Keywords—automated essay scoring, textual feedback generation, meta-metrics, human-in-the-loop


[PDF]

[Supplementary]

Cite: Maryam Berijanian, Christopher G. Shaltry, and Dirk Colbry, "Evaluating the Evaluators: Metrics for Automated Essay Feedback Generation," International Journal of Information and Education Technology, vol. 16, no. 4, pp. 894-901, 2026.


Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions

Menu