Giang Nguyen
I build Steerling-8B: the world-first interpretable language model you can understand, control, and trust.
Email·
GitHub·
Google Scholar·
LinkedIn·
Resume ↗
I'm an interpretability researcher at Guide Labs in San Francisco. My work is about making AI models you can understand, control, and trust, not just ones where you can only see the outputs.
I completed my Ph.D. in Computer Science at Auburn University in 2025, advised by Anh Nguyen, supported by the Presidential Graduate Research Fellowship.
2025 – now
Guide Labs
— Interpretability Researcher, San Francisco
- Training: 8B interpretable language models pretrained on trillions of tokens. [1] [2]
- Explanation: Attribution methods for model behavior at the concept, prompt, and input-to-concept levels. [1]
- Control: Controllable generation for topic control and safety alignment without fine-tuning. [3] [4]
2024
JPMorgan AI Research
— Research Scientist Intern, New York
Research on interpretable LLMs for tabular data.
2021 – 2025
Auburn University
— Ph.D., Computer Science
2018 – 2020
KAIST
— M.Sc., Computer Science, South Korea
news
May 2026
The Attribution Contract: Feature Attribution for Generative Language Models
A position paper arguing that feature attribution in generative language models is under-specified: the same attribution method can answer different questions depending on what you are explaining. Introduces the Attribution Contract, a framework for naming the explanatory setting under which a feature-attribution claim is made.
Feb 2026
Steerling-8B released
The first large-scale language model with interpretability built into the architecture: every generated token traces back to input context, training data, and human-understandable concepts. Self-monitors for memorized content and suppresses it at inference time without retraining. Featured in
TechCrunch.
[blog]
[github]
[huggingface]
[techcrunch]
selected publications
Full list on Google Scholar.
2026
Prototype Language Models. Dan Ley, Giang Nguyen, Himabindu Lakkaraju, Julius Adebayo. Preprint coming soon.
[blog]
2026
Scaling Interpretable Language Models to 8 Billion Parameters. Guide Labs Team. Preprint coming soon.
[blog]
TMLR 2025
Interpretable LLM-based Table Question Answering. Nguyen, G., Brugere, I., Sharma, S., Kariyappa, S., Nguyen, A.T., Lecue, F.
[pdf]
ICLR & TMLR 2024
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans. Nguyen, G., Chen, V., Taesiri, M.R., Nguyen, A.
[pdf]
NeurIPS 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification. Taesiri, M.R., Nguyen, G., Habchi, S., Bezemer, C., Nguyen, A.
[pdf]
NeurIPS 2022
Visual Correspondence-based Explanations Improve AI Robustness and Human-AI Team Accuracy. Nguyen, G.*, Taesiri, M.R.*, Nguyen, A. (*equal contribution)
[pdf]
NeurIPS 2021
The Effectiveness of Feature Attribution Methods and Its Correlation with Automatic Evaluation Scores. Nguyen, G., Kim, D., Nguyen, A.
[pdf]
academic service
Active reviewer for TMLR. Also reviewed for NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, ACL Rolling Review.
© 2026 Giang Nguyen ·
updates archive