Giang Nguyen

I build AI models you can look inside.

Giang Nguyen

I'm an interpretability researcher at Guide Labs in San Francisco. My work spans training interpretable language models at scale, explaining model behavior at the concept level, and building tools for controllable generation.

I completed my Ph.D. in Computer Science at Auburn University in 2025, advised by Anh Nguyen, supported by the Presidential Graduate Research Fellowship.

Resume available on request.


2025 – now
Guide Labs — Interpretability Researcher, San Francisco
  • Train: First large-scale inherently interpretable language model, Steerling-8B, pretrained on trillions of tokens.
  • Explain: Attributing model behavior at the concept, prompt, and input-to-concept levels.
  • Control: Controllable generation for topic steering and safety alignment without fine-tuning.
2024
JPMorgan AI Research — Research Scientist Intern, New York
Research on interpretable LLMs for tabular data.
2021 – 2025
Auburn University — Ph.D., Computer Science
Thesis: Transforming the black-box decision-making of AI models into explain-then-answer processes
2018 – 2020
KAIST — M.Sc., Computer Science, South Korea
Thesis: Overcoming catastrophic forgetting by deep visualization

news

May 2026
Feature Attribution for Generative Language Models
My first solo paper. A position paper arguing that feature attribution in generative language models is inherently under-specified: the same attribution method can answer different questions depending on what you are explaining.
[arXiv] [pdf]
Feb 2026
Steerling-8B: The First Inherently Interpretable Language Model
The first large-scale language model with interpretability built into the architecture: every generated token traces back to input context, training data, and human-understandable concepts. Featured in TechCrunch.
[blog] [github] [huggingface] [techcrunch]

selected publications

Full list on Google Scholar.

Prototype Language Models. [blog]
Scaling Interpretable Language Models to 8 Billion Parameters. [blog]
Interpretable LLM-based Table Question Answering. [pdf]
Probable-Class Nearest-Neighbor Explanations. [pdf]
Visual Correspondence-based Explanations. [pdf]
The Effectiveness of Feature-Attribution Explanations. [pdf]

academic service

Active reviewer for TMLR. Also reviewed for NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, ACL Rolling Review.


© 2026 Giang Nguyen · updates archive