Cold-Start Generalization in Educational Interaction Data: Comparing Student-Wise and Question-Wise Splits with Probabilistic Calibration

Main Article Content

Purwadi Purwadi
Othman Bin Mohd
Nor Azman Bin Abu

Abstract

Predictive models in Intelligent Tutoring Systems often face performance degradation due to sparse data and the cold-start problem, further compounded by a lack of probability calibration in standard evaluations. This study bridges this gap by systematically evaluating the trade-off between discriminative accuracy and probabilistic reliability through student-wise and question-wise splits, utilizing interaction data from the MathE platform across eight countries. By comparing identifier-based and metadata-based Logistic Regression models under a Leave-One-Country-Out protocol, we assessed generalization capabilities against distribution shifts. The results reveal a fundamental dichotomy: while identifier-based models achieve superior precision (AUC 0.687) and calibration in scenarios with historical context, they suffer from significant performance drops in student cold-start settings and exhibit negative transfer during cross-country deployment. Conversely, metadata-based models demonstrate higher robustness and invariance across varying demographics. We conclude that relying solely on accuracy metrics masks model uncertainty in new domains and recommend a "safe-start" strategy that prioritizes metadata-based features for system initialization to ensure reliable pedagogical decision-making before personalizing based on accumulated user history.

Article Details

How to Cite
Purwadi, P., Othman Bin Mohd, & Nor Azman Bin Abu. (2026). Cold-Start Generalization in Educational Interaction Data: Comparing Student-Wise and Question-Wise Splits with Probabilistic Calibration. International Journal of Machine Learning (IJOML), 1(1), 38–50. https://doi.org/10.66472/ijoml.v1i1.4
Section
Articles