Yangsibo Huang

Yangsibo Huang

I’m a research scientist at Google NYC. Before that, I was a Ph.D. student and Wallace Memorial Fellow at Princeton University, co-advised by Prof. Kai Li and Prof. Sanjeev Arora.

I work at the intersection of machine learning, systems, and policy. My research explores how and why machine learning systems may go wrong, through the lenses of privacy and copyright violation, as well as security and safety concerns. Recently, I am also interested in model memorization and its implications on capabilities.


✰ Awards


🎙 Recent Talks


🗞️ News


✎ Selected Publications

Please refer to full publications or my Google Scholar profile for the full list. “(α)” stands for alphabetical order (with “⁺” standing for lead), “*” stands for equal contribution.


  1. On Memorization of Large Language Models in Logical Reasoning
    Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, and Ravi Kumar
  2. An Adversarial Perspective on Machine Unlearning for AI Safety
    Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, and Javier Rando
  3. (α) Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
    Lynn Chua, Badih Ghazi, Yangsibo Huang⁺, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang
  4. ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
    Xindi Wu*, Dingli Yu*, Yangsibo Huang*, Olga Russakovsky, and Sanjeev Arora
  5. MUSE: Machine Unlearning Six-Way Evaluation for Language Models
    Weijia Shi*, Jaechan Lee*, Yangsibo Huang*, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A Smith, and Chiyuan Zhang
  6. Evaluating Copyright Takedown Methods for Language Models
    Boyi Wei*, Weijia Shi*, Yangsibo Huang*, Noah A Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, and Peter Henderson
  7. Fantastic Copyrighted Beasts and How (Not) to Generate Them
    Luxi He*, Yangsibo Huang*, Weijia Shi*, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, and Peter Henderson
  8. SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
    Tinghao Xie*, Xiangyu Qi*, Yi Zeng*, Yangsibo Huang*, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, and Prateek Mittal
  9. A Safe Harbor for AI Evaluation and Red Teaming
    Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, and Peter Henderson
  10. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
    Boyi Wei*, Kaixuan Huang*, Yangsibo Huang*, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson
  11. Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
    Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen
  12. Detecting Pretraining Data from Large Language Models
    Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer
  1. Privacy Implications of Retrieval-Based Language Models
    Yangsibo Huang, Samyak Gupta, Zexuan Zhong, Kai Li, and Danqi Chen
    1. Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
      Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora


        ㋡ Experiences


        ♥ Service


        ꐕ MISC

        rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora