IJIET 2026 Vol.16(4): 894-901
doi: 10.18178/ijiet.2026.16.4.2561
doi: 10.18178/ijiet.2026.16.4.2561
Evaluating the Evaluators: Metrics for Automated Essay Feedback Generation
Maryam Berijanian1,*, Christopher G. Shaltry2, and Dirk Colbry1
1. Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, USA
2. College of Human Medicine and College of Osteopathic Medicine, Michigan State University, East Lansing, MI, USA
Email: berijani@msu.edu (M.B.); shaltryc@msu.edu (C.G.S.); colbrydi@msu.edu (D.C.)
*Corresponding author
2. College of Human Medicine and College of Osteopathic Medicine, Michigan State University, East Lansing, MI, USA
Email: berijani@msu.edu (M.B.); shaltryc@msu.edu (C.G.S.); colbrydi@msu.edu (D.C.)
*Corresponding author
Manuscript received August 14, 2025; revised October 9, 2025; accepted November 26, 2025; published April 10, 2026
Abstract—Automated Essay Scoring (AES) systems have improved with advances in Natural Language Processing (NLP), but they often prioritize grade prediction over qualitative feedback, which is crucial for student learning. This study evaluates the capabilities of Large Language Models (LLMs) in generating detailed, context-specific feedback, with the goal of improving student understanding. More importantly, it focuses on validating the efficacy of widely used NLP metrics for assessing feedback quality, analyzing their alignment with human judgments. The findings highlight both the strengths and limitations of these metrics in evaluating qualitative feedback within AES contexts. By comparing LLM performance under few-shot learning and fine-tuning conditions, the study identifies both promising directions and persistent challenges in automated feedback generation. Overall, the results emphasize that while LLMs can enhance feedback generation, current metrics remain inadequate for reliably guiding such improvement, underscoring the need for more robust evaluation frameworks.
Keywords—automated essay scoring, textual feedback generation, meta-metrics, human-in-the-loop
[Supplementary]
Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
Keywords—automated essay scoring, textual feedback generation, meta-metrics, human-in-the-loop
[Supplementary]
Cite: Maryam Berijanian, Christopher G. Shaltry, and Dirk Colbry, "Evaluating the Evaluators: Metrics for Automated Essay Feedback Generation," International Journal of Information and Education Technology, vol. 16, no. 4, pp. 894-901, 2026.
Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).