Cold-Start Generalization in Educational Interaction Data: Comparing Student-Wise and Question-Wise Splits with Probabilistic Calibration
Main Article Content
Abstract
Predictive models in Intelligent Tutoring Systems often face performance degradation due to sparse data and the cold-start problem, further compounded by a lack of probability calibration in standard evaluations. This study bridges this gap by systematically evaluating the trade-off between discriminative accuracy and probabilistic reliability through student-wise and question-wise splits, utilizing interaction data from the MathE platform across eight countries. By comparing identifier-based and metadata-based Logistic Regression models under a Leave-One-Country-Out protocol, we assessed generalization capabilities against distribution shifts. The results reveal a fundamental dichotomy: while identifier-based models achieve superior precision (AUC 0.687) and calibration in scenarios with historical context, they suffer from significant performance drops in student cold-start settings and exhibit negative transfer during cross-country deployment. Conversely, metadata-based models demonstrate higher robustness and invariance across varying demographics. We conclude that relying solely on accuracy metrics masks model uncertainty in new domains and recommend a "safe-start" strategy that prioritizes metadata-based features for system initialization to ensure reliable pedagogical decision-making before personalizing based on accumulated user history.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.