Giang Nguyen
I build AI models you can look inside.
Email·
GitHub·
Google Scholar·
LinkedIn
2025 – now
Guide Labs
— Interpretability Researcher, San Francisco
- Train: First large-scale inherently interpretable language model, Steerling-8B, pretrained on trillions of tokens.
- Explain: Attributing model behavior at the concept, prompt, and input-to-concept levels.
- Control: Controllable generation for topic steering and safety alignment without fine-tuning.
2024
JPMorgan AI Research
— Research Scientist Intern, New York
Research on interpretable LLMs for tabular data.
2021 – 2025
Auburn University
— Ph.D., Computer Science
2018 – 2020
KAIST
— M.Sc., Computer Science, South Korea
news
May 2026
Feature Attribution for Generative Language Models
My first solo paper. A position paper arguing that feature attribution in generative language models is inherently under-specified: the same attribution method can answer different questions depending on what you are explaining.
[arXiv]
[pdf]
Feb 2026
Steerling-8B: The First Inherently Interpretable Language Model
The first large-scale language model with interpretability built into the architecture: every generated token traces back to input context, training data, and human-understandable concepts. Featured in
TechCrunch.
[blog]
[github]
[huggingface]
[techcrunch]
selected publications
Full list on Google Scholar.
Prototype Language Models.
[blog]
Scaling Interpretable Language Models to 8 Billion Parameters.
[blog]
Interpretable LLM-based Table Question Answering.
[pdf]
Probable-Class Nearest-Neighbor Explanations.
[pdf]
Visual Correspondence-based Explanations.
[pdf]
The Effectiveness of Feature-Attribution Explanations.
[pdf]
academic service
Active reviewer for TMLR. Also reviewed for NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, ACL Rolling Review.
© 2026 Giang Nguyen ·
updates archive