Yangsibo Huang

Welcome! I am a Ph.D. candidate and Wallace Memorial Fellow at Princeton University. I am very fortunate to be co-advised by Prof. Kai Li and Prof. Sanjeev Arora. I have been doing research at the intersection of machine learning, systems, and policy, with a focus on auditing and improving machine learning systems’ compliance with policies, from the perspectives of

Privacy: I explore privacy risks and mitigation in distributed training [NeurIPS’21, NeurIPS’22, EMNLP-Findings’20, ICML’20] and retrieval-based language models [EMNLP’23]. I improve the efficiency [NeurIPS’23] and accuracy [ICLR’24] of differentially private training. My work has been deployed inside Google AI and Meta AI, resulted into an invited chapter in the textbook Federated Learning and a white paper on Differential Privacy.
Safety: I demonstrate safety alignment in existing large language models are brittle at the level of both behavior [ICLR’24] and knowledge [Preprint’24]. I am also co-organizing the Princeton AI Alignment and Safety Seminar alongside Sadhika Malladi.
Data usage: I build tools to audit data usage in large language models [ICLR’24] and medical image analysis [IEEE TMI’22].

I also believe in the power of community efforts to enhance the trustworthiness and transparency of machine learning systems. Recently, we (with researchers from 13 institutes) advocate for A Safe Harbor for AI Evaluation and Red Teaming, encouraging AI companies to provide legal and technical protections for good faith research on their AI models. We also release an open letter (signed by 300+ researchers, and reported by The Washington Post, VentureBeat, AIPwn, and Computerworld).

I did my undergrad in Computer Science at Zhejiang University. I also spent a great semester at Harvard Medical School under the supervision of Prof. Quanzheng Li.

News

[03/2024] We (with researchers from 13 institutes) advocate for A Safe Harbor for AI Evaluation and Red Teaming, encouraging AI companies to provide legal and technical protections for good faith research on their AI models. We also release an open letter and it has been signed by 300+ researchers.
[02/2024] I am co-organizing the Princeton AI Alignment and Safety Seminar with Sadhika Malladi. Please join our mailing list to get notified of speakers and livestream links!
[01/2024] Our survey on advancing Differential Privacy’s deployment in real-world applications got accepted by Harvard Data Science Review.
[01/2024] Our work on jailbreaking open-source LLMs by simply exploiting generation configurations got accepted at ICLR 2024 as a spotlight.
[01/2024] Detecting pre-training data from LLMs (with collaborators at UW) got accepted at ICLR 2024.
[11/2023] I contributed a chapter to the textbook Federated Learning: Theory and Practice (published by Elsevier).
[10/2023] Our study on privacy risks in retrieval-based language models got accepted at EMNLP 2023.
[09/2023] Sparsity-Preserving Differentially Private Training of Large Embedding Models got accepted at NeurIPS 2023 and highlighted by Google AI blog post.
[08/2023] Humbled to be selected as a Rising Star in EECS this year!
[04/2023] Humbled to be a recipient of the Wallace Memorial Fellowship in year 2023-2024.
[08/2022] Proposal on data auditing for ML models won Meta’s Privacy-Enhancing Technologies call for proposals.

Selected Publications and Manuscripts

Please refer to publications or my Google Scholar profile for the full list. ("(α)" stands for alphabetical order)

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson

ICLR Secure and Trustworthy LLMs 2024 (Best Paper) Paper Code Website
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen

ICLR 2024 (spotlight) Paper Code Poster Website Dataset
Detecting Pretraining Data from Large Language Modelss

Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer

ICLR 2024 (Oral at NeurIPS RegML) Paper Code Website Dataset
(α) LabelDP-Pro: Learning with Label Differential Privacy via Projections

Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, and Chiyuan Zhang

ICLR 2024 Paper

(α) Sparsity-Preserving Differentially Private Training

Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang

NeurIPS 2023 Paper Poster Website
Privacy Implications of Retrieval-Based Language Models

Yangsibo Huang, Samyak Gupta, Zexuan Zhong, Kai Li, and Danqi Chen

EMNLP 2023 Paper Code

Recovering Private Text in Federated Learning of Language Models

Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen

NeurIPS 2022 Paper Code Poster
A Dataset Auditing Method for Collaboratively Trained Machine Learning Models

Yangsibo Huang, Chun-Yin Huang, Xiaoxiao Li, and Kai Li

IEEE Transactions on Medical Imaging 2022 Paper Code

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora

NeurIPS 2021 (Oral) Paper Code Poster

Service

Program Committee member for workshop on Privacy Regulation and Protection in Machine Learning (co-located with ICLR 2024)
Program Committee member for workshop on Federated Learning and Analytics in Practice (co-located with ICML 2023)
Program Committee member for workshop on Federated Learning for Data Mining (co-located with KDD 2023)
Program Committee member for workshop on Interpretable Machine Learning in Healthcare (co-located with ICML 2021, 2022)
Program Committee member for workshop on Computer Vision for Automated Medical Diagnosis (co-located with ICCV 2021)
Reviewer for ICML (2022, 2023, 2024), NeurIPS (2021, 2022, 2023)
Reviewer for IEEE Transactions on Medical Imaging (TMI)

Contact me

You are welcome to reach out to me via email.

MISC

In my spare time, I mainly stay with my four cats 😺😻😼😽.