Full List of Publications and Manuscripts

2025

  1. Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
    Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, and others
  2. On Evaluating the Durability of Safeguards for Open-Weight LLMs
    Xiangyu Qi, Boyi Wei, Nicholas Carlini, Yangsibo Huang, Tinghao Xie, Luxi He, Matthew Jagielski, Milad Nasr, Prateek Mittal, and Peter Henderson
  3. MUSE: Machine Unlearning Six-Way Evaluation for Language Models
    Weijia Shi*, Jaechan Lee*, Yangsibo Huang*, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A Smith, and Chiyuan Zhang
  4. Fantastic Copyrighted Beasts and How (Not) to Generate Them
    Luxi He*, Yangsibo Huang*, Weijia Shi*, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, and Peter Henderson
  5. SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
    Tinghao Xie*, Xiangyu Qi*, Yi Zeng*, Yangsibo Huang*, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, and Prateek Mittal

2024

  1. Machine Unlearning Doesnโ€™t Do What You Think: Lessons for Generative AI Policy, Research, and Practice
    A Feder Cooper, Christopher A Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, and others
  2. On Memorization of Large Language Models in Logical Reasoning
    Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, and Ravi Kumar
  3. Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
    Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, and Chiyuan Zhang
  4. An Adversarial Perspective on Machine Unlearning for AI Safety
    Jakub ลucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramรจr, and Javier Rando
  5. Evaluating Copyright Takedown Methods for Language Models
    Boyi Wei*, Weijia Shi*, Yangsibo Huang*, Noah A Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, and Peter Henderson
  6. (ฮฑ) Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
    Lynn Chua, Badih Ghazi, Yangsibo Huangโบ, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang
  7. ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
    Xindi Wu*, Dingli Yu*, Yangsibo Huang*, Olga Russakovsky, and Sanjeev Arora
  8. AI Risk Management Should Incorporate Both Safety and Security
    Xiangyu Qi*, Yangsibo Huang*, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, and Prateek Mittal
  9. A Safe Harbor for AI Evaluation and Red Teaming
    Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, and Peter Henderson
  10. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
    Boyi Wei*, Kaixuan Huang*, Yangsibo Huang*, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson
  11. (ฮฑ) Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment
    Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, and Wanrong Zhang
  12. Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
    Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen
  13. Detecting Pretraining Data from Large Language Models
    Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer
  14. (ฮฑ) LabelDP-Pro: Learning with Label Differential Privacy via Projections
    Badih Ghazi, Yangsibo Huangโบ, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, and Chiyuan Zhang

2023

  1. (ฮฑ) Sparsity-Preserving Differentially Private Training
    Badih Ghazi, Yangsibo Huangโบ, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang
  2. Privacy Implications of Retrieval-Based Language Models
    Yangsibo Huang, Samyak Gupta, Zexuan Zhong, Kai Li, and Danqi Chen
  3. kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models
    Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, and Yin Tat Lee

2022

  1. Recovering Private Text in Federated Learning of Language Models
    Samyak Gupta*, Yangsibo Huang*, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen
  2. A Dataset Auditing Method for Collaboratively Trained Machine Learning Models
    Yangsibo Huang, Chun-Yin Huang, Xiaoxiao Li, and Kai Li

2021

  1. Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
    Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora
  2. EMA: Auditing Data Removal from Trained Models
    Yangsibo Huang, Xiaoxiao Li, and Kai Li
  3. DeepMC: a deep learning method for efficient Monte Carlo beamlet dose calculation by predictive denoising in magnetic resonance-guided radiotherapy
    Ryan Neph, Qihui Lyu, Yangsibo Huang, You Ming Yang, and Ke Sheng

2020

  1. TextHide: Tackling Data Privacy in Language Understanding Tasks
    Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, and Sanjeev Arora
  2. InstaHide: Instance-hiding Schemes for Private Distributed Learning
    Yangsibo Huang, Zhao Song, Kai Li, and Sanjeev Arora
  3. Privacy-preserving learning via deep net pruning
    Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, and Kai Li

2019

  1. Deep Q learning Driven CT Pancreas Segmentation with Geometry-aware U-Net
    Yunze Man, Yangsibo Huang, Junyi Feng, Xi Li, and Fei Wu
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora