Independent Research
Concept-Level Overfitting in Latent Space during Synthetic Pretraining
Shows concept-level overfitting even when token sequences stay unique.
Veridical-QA: Evaluating Truthfulness in LLM Responses
Experiments on BES, CSI, and antisycophancy metrics to measure whether models produce verifiably correct answers.