Aidan Ewart
Updated 26/11/2024
Hi! I'm an undergrad studying maths at the University of Bristol, and I do research into ensuring the safety of ML systems in my free time. My main research interests at the moment revolve around making red-teaming cheaper in a bunch of different ways.
I've previously interned at Haize Labs where we work on automated red-teaming for frontier language models, and been a Research Scholar at MATS under Stephen Casper and Alex Turner working on adversarial robustness.
To contact me feel free to DM me on Twitter/X!
Select Publications
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham*, Aidan Ewart*, Logan Riggs*, Robert Huben, Lee Sharkey
Demonstrates an unsupervised method for finding human-understandable decompositions of LM activations.
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Aidan Ewart*, Abhay Sheshadri*, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper
Develops a new method for adverserially training LMs.
Eight Methods to Evaluate Robust Unlearning in LLMs
Aengus Lynch*, Phillip Guo*, Aidan Ewart*, Stephen Casper, Dylan Hadfield-Menell
Rigorously evaluates the machine unlearning done in Eldan and Russinovich (2023).
Select Interesting/Funny Projects
A Type Theory/Theorem Prover
A type theory which is technically usable as a proof checker, although I wouldn't recommend it.A Compiler for a Functional Programming Language
Compiles a high-level functional language to C using continuation passing style.A Theorem-Proving 'Extension' for Lua
An almost-usable proof assistant!A Compiler for AQA Pseudocode to AQA Assembly
There was a question on my A-level asking me to manually compile to a simplified ARM assembly, so obviously the correct response was to write a compiler targeting it instead.