Giang Nguyen

I build AI models you can look inside.

I'm an interpretability researcher at Guide Labs in San Francisco. My work spans training interpretable language models at scale, explaining model behavior at the concept level, and building tools for controllable generation.

I completed my Ph.D. in Computer Science in Anh Nguyen's lab, supported by the Presidential Graduate Research Fellowship.

Resume available on request.

2025 – now

Guide Labs — Interpretability Researcher, San Francisco

Train: First large-scale inherently interpretable language model, Steerling-8B, pretrained on trillions of tokens.
Explain: Attributing model behavior at the concept, prompt, and input-to-concept levels.
Control: Controllable generation for topic steering and safety alignment without fine-tuning.

2024

JPMorgan AI Research — Research Scientist Intern, New York

Research on interpretable LLMs for tabular data.

2021 – 2025

Auburn University — Ph.D., Computer Science

Thesis: Transforming the black-box decision-making of AI models into explain-then-answer processes

2018 – 2020

KAIST — M.Sc., Computer Science, South Korea

Thesis: Overcoming catastrophic forgetting by deep visualization

news

Jul 2026

Talk: From Post-hoc Tools to Interpretable-by-Design AI Models

Why post-hoc explanations aren't enough, and what interpretable-by-design looks like in practice. Includes a live demo: detect an age-bias concept in a hiring prompt, steer it away with no retraining needed.

Jun 2026

Scaling Inherently Interpretable Language Models

116-page technical report on how we scaled interpretable AI to 8B parameters. One finding: interpretability improves with scale: the more compute you put in, the more interpretable the model becomes. [pdf]

A scientific video showing the key findings for interpretability scaling: how interpretability scales with compute, what interpretability looks like at 8B parameters, and what this means for building AI you can trust.

May 2026

Demo: Audit-then-control Steerling-8B in real time

Concept-level steering in real time: highlight text, see which internal concepts drove it, steer them, compare outputs side-by-side.

[view on X ↗]

May 2026

Feature Attribution for Generative Language Models

My first solo paper. A position paper arguing that feature attribution in generative language models is inherently under-specified: the same attribution method can answer different questions depending on what you are explaining.
[arXiv] [pdf]

Feb 2026

Steerling-8B: The First Inherently Interpretable Language Model

The first large-scale language model with interpretability built into the architecture: every generated token traces back to input context, training data, and human-understandable concepts. Featured in TechCrunch.
[blog] [github] [huggingface] [techcrunch]

Older updates in the archive.

selected publications

Full list on Google Scholar.

Prototype Language Models. [blog]

Scaling Interpretable Language Models to 8 Billion Parameters. [blog]

Step-wise Verifiable Explanations (for Table Question Answering). [pdf]

Probable-Class Nearest-Neighbor Explanations. [pdf]

Visual Correspondence-based Explanations. [pdf]

The Effectiveness of Feature-Attribution Explanations. [pdf]

academic service

Active reviewer for TMLR. Also reviewed for NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, ACL Rolling Review.