Rheeya Uppaal

I am a fourth-year PhD candidate at the University of Wisconsin-Madison, advised by Prof Junjie Hu. My research focuses on improving reliability in language models by developing methods to evaluate, understand, and control model behavior. My work studies reliability at two levels: observable behavior and model internals. On the behavioral side, I design evaluations that detect failures in model outputs, including unsafe generations ^['26], unfaithful reasoning ^['26] and failures under distribution shift ^{['24, '26]}. On the internal side, I study how models encode reliability-relevant concepts, and develop representation-level interventions to study and improve safety and alignment ^{['23, '25,]}. More broadly, my goal is to make language model reliability more systematic by linking behavioral evidence of failure with internal mechanisms for control.

Prior to my PhD, I was a researcher at Goldman Sachs CoreAI, where I worked on information extraction and interpretability methods for text in the financial domain under Dr Vijay Saraswat. I completed my Masters in Computer Science at UMass Amherst, where I worked under the wonderful guidance of Prof Andrew McCallum and Prof Madalina Fiterau.

You can find my single page Resumé here, or a more detailed CV here.

What's New:

January 2026: Journey Before Destination was accepted as an Oral to EACL 2026! Looking forward to some interesting conversations in Morocco.
March 2025: I gave an invited talk at Cohere for AI’s Research Connections Community on developing robust Model Editing Techniques.
Feb 2025: ProFS was accepted to ICLR 2025! Excited to have some great conversations in Singapore.
Nov 2024: PhD Milestone check! I passed my qualifying exam and am now a PhD candidate!
June 2024: Excited to start an internship at Amazon Science, where I'll be working under the guidance of Markus Dreyer and Mohit Bansal.