Hello! I’m Saffron. I’m a technologist, researcher and writer.
I’m a research scientist on the Societal Impacts team at Anthropic. We tackle big questions about how AI will change society, and use these insights to guide responsible AI development.
I previously co-founded the Collective Intelligence Project (CIP), which works to make AI development more democratic and use AI to strengthen democracy. For this work, my co-founder and I were named in the 2024 TIME 100 Most Influential People in AI list, and our work was featured twice in the New York Times. As an advisor there now, I continue to support their work on improving how society makes decisions about transformative technologies.
Before that, I was a research engineer at DeepMind, working on a hodgepodge of things including: language models, human-AI interaction, conceptual reasoning, value alignment, and multi-agent RL.
My bylines have appeared in WIRED, The New Statesman, Reboot and elsewhere. You can also find my thoughts at Substack and Twitter.
More stuff about me: I co-founded Kernel Magazine. I have a degree in Applied Mathematics-Computer Science, with minors in Government and German, from Harvard. I'm from New Zealand. And sometimes I do photography.
Selected Research
- Collective Constitutional AI - S Huang*, D Siddarth*, L Lovitt*, ..., D Ganguli* (*joint co-authors) (2024). Featured in the New York Times. Blog post.
- Beyond Static AI Evaluations: Advancing Human Interaction Evaluations for LLM Harms and Risks - L Ibrahim, S Huang, ...
- Using the Veil of Ignorance to align AI systems with principles of justice - L Weidinger, K McKee, ... (PNAS 2023)
- Generative AI and the Digital Commons - S Huang, D Siddarth (2022)
- Red Teaming Language Models with Language Models - E Perez, S Huang, ... (EMNLP 2022)
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher - JW Rae, S Borgeaud, ... (2021)
See more research → (or view on Google Scholar)
Selected Writing
See more writing →
Talks / Podcasts
Extended Publication List
- Clio: Privacy-Preserving Insights into Real-World AI Use - Anthropic Societal Impacts Team (arXiv 2024). Introduces a system for analyzing AI assistant usage patterns across millions of conversations while preserving user privacy.
- How large language models can reshape collective intelligence - Nature Human Behaviour (2024).
- Evaluating feature steering: A case study in mitigating social biases - Anthropic Societal Impacts Team (arXiv 2024).
- How will advanced AI systems impact democracy? - arXiv (2024).
- Collective Constitutional AI - S Huang*, D Siddarth*, L Lovitt*, ..., D Ganguli* (*joint co-authors) (2024). Featured in the New York Times. Blog post.
- Beyond Static AI Evaluations: Advancing Human Interaction Evaluations for LLM Harms and Risks - L Ibrahim, S Huang, ...
- Using the Veil of Ignorance to align AI systems with principles of justice - L Weidinger, K McKee, ... (PNAS 2023)
- Generative AI and the Digital Commons - S Huang, D Siddarth (2022)
- Red Teaming Language Models with Language Models - E Perez, S Huang, ... (EMNLP 2022)
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher - JW Rae, S Borgeaud, ... (2021)
- Improving language models by retrieving from trillions of tokens - S Borgeaud, A Mensch, ... (ICML 2022)
- Bi-Level Multi-Agent Reinforcement Learning for Intervening in Intertemporal Social Dilemmas - S Huang (2020)