TY - GEN
T1 - Comparing Emotion Detection Methods in Online Classrooms
T2 - 16th IEEE Global Engineering Education Conference, EDUCON 2025
AU - Parambil, Medha Mohan Ambali
AU - Bouktif, Salah
AU - Gochoo, Munkhjargal
AU - Alnajjar, Fady
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The COVID-19 pandemic has transformed learning environments, challenging educators to understand students' behaviors during the online mode of learning, in particular emotions associated with students' attention during virtual classrooms. As learning transitions between physical and virtual spaces, the ability to interpret student attention and engagement has become complex. In response to this challenge, our research investigates the use of GPT-4o, a multimodal large language model, for identifying student emotions by analyzing images in diverse learning settings. The study involved analyzing online classroom images featuring 149 faces, utilizing three distinct approaches: a computer vision model (YOLO), the multimodal LLM (GPT-4o), and a human-annotated baseline. The analysis systematically categorized facial expressions into eight emotional categories: Happy, Sad, Angry, Neutral, Contempt, Disgust, Fear, and Surprise. The findings indicate that multimodal LLMs can effectively detect student emotions, achieving an average accuracy of 93.8%, which aligns with the human baseline accuracy of 97.0%. In contrast, YOLO models maintained an average accuracy of 81.9%, performing well for basic emotions but struggling with subtle expressions. This research contributes to enhancing educational practices by providing valuable insights regarding the application of multimodal LLMs to assist educators in comprehending student emotions within both physical and digital classroom settings.
AB - The COVID-19 pandemic has transformed learning environments, challenging educators to understand students' behaviors during the online mode of learning, in particular emotions associated with students' attention during virtual classrooms. As learning transitions between physical and virtual spaces, the ability to interpret student attention and engagement has become complex. In response to this challenge, our research investigates the use of GPT-4o, a multimodal large language model, for identifying student emotions by analyzing images in diverse learning settings. The study involved analyzing online classroom images featuring 149 faces, utilizing three distinct approaches: a computer vision model (YOLO), the multimodal LLM (GPT-4o), and a human-annotated baseline. The analysis systematically categorized facial expressions into eight emotional categories: Happy, Sad, Angry, Neutral, Contempt, Disgust, Fear, and Surprise. The findings indicate that multimodal LLMs can effectively detect student emotions, achieving an average accuracy of 93.8%, which aligns with the human baseline accuracy of 97.0%. In contrast, YOLO models maintained an average accuracy of 81.9%, performing well for basic emotions but struggling with subtle expressions. This research contributes to enhancing educational practices by providing valuable insights regarding the application of multimodal LLMs to assist educators in comprehending student emotions within both physical and digital classroom settings.
UR - https://www.scopus.com/pages/publications/105008203405
UR - https://www.scopus.com/pages/publications/105008203405#tab=citedBy
U2 - 10.1109/EDUCON62633.2025.11016401
DO - 10.1109/EDUCON62633.2025.11016401
M3 - Conference contribution
AN - SCOPUS:105008203405
T3 - IEEE Global Engineering Education Conference, EDUCON
BT - EDUCON 2025 - IEEE Global Engineering Education Conference, Proceedings
PB - IEEE Computer Society
Y2 - 22 April 2025 through 25 April 2025
ER -