I’m a research scientist at Google NYC. Before that, I was a Ph.D. student and Wallace Memorial Fellow at Princeton University, co-advised by Prof. Kai Li and Prof. Sanjeev Arora.
I work at the intersection of machine learning, systems, and policy. My research explores how and why machine learning systems may go wrong, through the lenses of privacy and copyright violation, as well as security and safety concerns. Recently, I am also interested in model memorization and its implications on capabilities.
✰ Awards
🎙 Recent Talks
-
Open Technical Questions in GenAI Copyright
-
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
- 09/24/2024, Unlearning Society @ Google DeepMind
-
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
-
Identifying, Understanding, and Mitigating Failure Modes of Safety Alignment in Large Language Models
-
Detecting Pretraining Data from Large Language Models
✎ Selected Publications
Please refer to full publications or my Google Scholar profile for the full list. “(α)” stands for alphabetical order (with “⁺” standing for lead), “*” stands for equal contribution.
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Yangsibo Huang,
Milad Nasr,
Anastasios Angelopoulos,
Nicholas Carlini,
Wei-Lin Chiang,
Christopher A Choquette-Choo,
Daphne Ippolito,
Matthew Jagielski,
Katherine Lee,
Ken Ziyu Liu,
and others
📍
Preprint
2025
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi*,
Jaechan Lee*,
Yangsibo Huang*,
Sadhika Malladi,
Jieyu Zhao,
Ari Holtzman,
Daogao Liu,
Luke Zettlemoyer,
Noah A Smith,
and Chiyuan Zhang
📍
ICLR
2025
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He*,
Yangsibo Huang*,
Weijia Shi*,
Tinghao Xie,
Haotian Liu,
Yue Wang,
Luke Zettlemoyer,
Chiyuan Zhang,
Danqi Chen,
and Peter Henderson
📍
ICLR
2025
(Oral Presentation at GenLaw@ICML’24)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Tinghao Xie*,
Xiangyu Qi*,
Yi Zeng*,
Yangsibo Huang*,
Udari Madhushani Sehwag,
Kaixuan Huang,
Luxi He,
Boyi Wei,
Dacheng Li,
Ying Sheng,
Ruoxi Jia,
Bo Li,
Kai Li,
Danqi Chen,
Peter Henderson,
and Prateek Mittal
📍
ICLR
2025
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie,
Yangsibo Huang,
Chiyuan Zhang,
Da Yu,
Xinyun Chen,
Bill Yuchen Lin,
Bo Li,
Badih Ghazi,
and Ravi Kumar
📍
Preprint
2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki,
Boyi Wei,
Yangsibo Huang,
Peter Henderson,
Florian Tramèr,
and Javier Rando
📍
NeurIPS SoLaR
2024
(Best Paper)
(α) Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
Lynn Chua,
Badih Ghazi,
Yangsibo Huang⁺,
Pritish Kamath,
Daogao Liu,
Pasin Manurangsi,
Amer Sinha,
and Chiyuan Zhang
📍
COLM
2024
(Talk at PPML’24)
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
Xindi Wu*,
Dingli Yu*,
Yangsibo Huang*,
Olga Russakovsky,
and Sanjeev Arora
📍
NeurIPS
2024
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre,
Sayash Kapoor,
Kevin Klyman,
Ashwin Ramaswami,
Rishi Bommasani,
Borhane Blili-Hamelin,
Yangsibo Huang,
Aviya Skowron,
Zheng-Xin Yong,
Suhas Kotha,
Yi Zeng,
Weiyan Shi,
Xianjun Yang,
Reid Southen,
Alexander Robey,
Patrick Chao,
Diyi Yang,
Ruoxi Jia,
Daniel Kang,
Sandy Pentland,
Arvind Narayanan,
Percy Liang,
and Peter Henderson
📍
ICML
2024
(Oral)
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei*,
Kaixuan Huang*,
Yangsibo Huang*,
Tinghao Xie,
Xiangyu Qi,
Mengzhou Xia,
Prateek Mittal,
Mengdi Wang,
and Peter Henderson
📍
ICML 2024 & ICLR Secure and Trustworthy LLMs
2024
(Best Paper)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang,
Samyak Gupta,
Mengzhou Xia,
Kai Li,
and Danqi Chen
📍
ICLR
2024
(Spotlight)
Detecting Pretraining Data from Large Language Models
Weijia Shi,
Anirudh Ajith,
Mengzhou Xia,
Yangsibo Huang,
Daogao Liu,
Terra Blevins,
Danqi Chen,
and Luke Zettlemoyer
📍
ICLR
2024
(Oral Presentation at RegML@NeurIPS’23)
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Yangsibo Huang,
Samyak Gupta,
Zhao Song,
Kai Li,
and Sanjeev Arora
📍
NeurIPS
2021
(Oral)
㋡ Experiences
♥ Service
ꐕ MISC
- In my spare time, I mainly stay with my four cats 😺😻😼😽.
- I enjoy reading books about psychology.