I’m a research scientist at Google NYC. Before that, I was a Ph.D. student and Wallace Memorial Fellow at Princeton University, co-advised by Prof. Kai Li and Prof. Sanjeev Arora.
I work at the intersection of machine learning, systems, and policy. My research explores how and why machine learning systems may go wrong, through the lenses of privacy and copyright violation, as well as security and safety concerns. Recently, I am also interested in model memorization and its implications on capabilities.
✰ Awards
🎙 Recent Talks
-
Open Technical Questions in GenAI Copyright
-
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
- 09/24/2024, Unlearning Society @ Google DeepMind
-
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
-
Identifying, Understanding, and Mitigating Failure Modes of Safety Alignment in Large Language Models
-
Detecting Pretraining Data from Large Language Models
🗞️ News
- 📊 [09/2024] We released ConceptMix, new benchmark that evaluates how well text-to-image models can generate images that accurately combine multiple visual concepts. Interestingly, we found that image generation models struggle with combining >3 visual concepts (e.g., “red,” “fluffy,” “squared,” “smartphone”) & we attribute this to their training data.
- 📃 [08/2024] Collecting, using, and sharing human feedback on models brings up new privacy and copyright concerns. We discuss these issues, along with others, in this recent work: The Future of Open Human Feedback.
- 📊 [07/2024] We released a new LLM safety benchmark, SORRY-Bench, designed to systematically evaluate how well LLMs refuse unsafe requests across 45 fine-grained harmful categories.
- 🐱 [06/2024] How easy is it for current image/video generation systems to output copyrighted characters like Mario and Batman (which leads to legal risks)? How can we prevent them from doing so? Check out our CopyCat!
- 📃 [03/2024] We release an open letter advocating for A Safe Harbor for AI Evaluation and Red Teaming. It has been signed by 300+ researchers and reported by The Washington Post, VentureBeat, AIPwn, and Computerworld.
- 📙 [01/2024] Our white paper on advancing Differential Privacy’s deployment in real-world applications got accepted by Harvard Data Science Review.
✎ Selected Publications
Please refer to full publications or my Google Scholar profile for the full list. “(α)” stands for alphabetical order (with “⁺” standing for lead), “*” stands for equal contribution.
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie,
Yangsibo Huang,
Chiyuan Zhang,
Da Yu,
Xinyun Chen,
Bill Yuchen Lin,
Bo Li,
Badih Ghazi,
and Ravi Kumar
📍
Preprint
2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki,
Boyi Wei,
Yangsibo Huang,
Peter Henderson,
Florian Tramèr,
and Javier Rando
📍
Preprint
2024
(Oral Presentation at SoLaR@NeurIPS’24)
(α) Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
Lynn Chua,
Badih Ghazi,
Yangsibo Huang⁺,
Pritish Kamath,
Daogao Liu,
Pasin Manurangsi,
Amer Sinha,
and Chiyuan Zhang
📍
COLM
2024
(Talk at PPML’24)
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
Xindi Wu*,
Dingli Yu*,
Yangsibo Huang*,
Olga Russakovsky,
and Sanjeev Arora
📍
NeurIPS
2024
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi*,
Jaechan Lee*,
Yangsibo Huang*,
Sadhika Malladi,
Jieyu Zhao,
Ari Holtzman,
Daogao Liu,
Luke Zettlemoyer,
Noah A Smith,
and Chiyuan Zhang
📍
Preprint
2024
Evaluating Copyright Takedown Methods for Language Models
Boyi Wei*,
Weijia Shi*,
Yangsibo Huang*,
Noah A Smith,
Chiyuan Zhang,
Luke Zettlemoyer,
Kai Li,
and Peter Henderson
📍
NeurIPS
2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He*,
Yangsibo Huang*,
Weijia Shi*,
Tinghao Xie,
Haotian Liu,
Yue Wang,
Luke Zettlemoyer,
Chiyuan Zhang,
Danqi Chen,
and Peter Henderson
📍
Preprint
2024
(Oral Presentation at GenLaw@ICML’24)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Tinghao Xie*,
Xiangyu Qi*,
Yi Zeng*,
Yangsibo Huang*,
Udari Madhushani Sehwag,
Kaixuan Huang,
Luxi He,
Boyi Wei,
Dacheng Li,
Ying Sheng,
Ruoxi Jia,
Bo Li,
Kai Li,
Danqi Chen,
Peter Henderson,
and Prateek Mittal
📍
Preprint
2024
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre,
Sayash Kapoor,
Kevin Klyman,
Ashwin Ramaswami,
Rishi Bommasani,
Borhane Blili-Hamelin,
Yangsibo Huang,
Aviya Skowron,
Zheng-Xin Yong,
Suhas Kotha,
Yi Zeng,
Weiyan Shi,
Xianjun Yang,
Reid Southen,
Alexander Robey,
Patrick Chao,
Diyi Yang,
Ruoxi Jia,
Daniel Kang,
Sandy Pentland,
Arvind Narayanan,
Percy Liang,
and Peter Henderson
📍
ICML
2024
(Oral)
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei*,
Kaixuan Huang*,
Yangsibo Huang*,
Tinghao Xie,
Xiangyu Qi,
Mengzhou Xia,
Prateek Mittal,
Mengdi Wang,
and Peter Henderson
📍
ICML 2024 & ICLR Secure and Trustworthy LLMs
2024
(Best Paper)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang,
Samyak Gupta,
Mengzhou Xia,
Kai Li,
and Danqi Chen
📍
ICLR
2024
(Spotlight)
Detecting Pretraining Data from Large Language Models
Weijia Shi,
Anirudh Ajith,
Mengzhou Xia,
Yangsibo Huang,
Daogao Liu,
Terra Blevins,
Danqi Chen,
and Luke Zettlemoyer
📍
ICLR
2024
(Oral Presentation at RegML@NeurIPS’23)
Privacy Implications of Retrieval-Based Language Models
Yangsibo Huang,
Samyak Gupta,
Zexuan Zhong,
Kai Li,
and Danqi Chen
📍
EMNLP
2023
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Yangsibo Huang,
Samyak Gupta,
Zhao Song,
Kai Li,
and Sanjeev Arora
📍
NeurIPS
2021
(Oral)
㋡ Experiences
♥ Service
- Area Chair for
- Program Committee member for
- Reviewer for ICML (2022, 2023, 2024), NeurIPS (2021, 2022, 2023), COLM (2024)
ꐕ MISC
- In my spare time, I mainly stay with my four cats 😺😻😼😽.
- I enjoy reading books about psychology.