Home @ Aidan's Website

Hi! I’m an undergrad studying maths at the University of Bristol, and I work on making AI systems safe in my free time.

Previously, I’ve interned at Haize Labs, and been a Research Scholar at MATS, in both cases mostly working on adversarial robustness.

The λ-Calculus, in Set Theory, in Coq (04-06-2020)

Formalizing different logics within set theory and type theory.

Select Publications

Sparse Auto-encoders Find Highly Interpretable Features in Language Models(ICLR 2024)

Hoagy Cunningham*, Aidan Ewart*, Logan Riggs*, Robert Huben, Lee Sharkey
Demonstrates an unsupervised method for finding human-understandable decompositions of LM activations.

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs(SoLaR @ NeurIPS 2024)

Aidan Ewart*, Abhay Sheshadri*, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper
Develops a new method for adversarially training LMs.

Eight Methods to Evaluate Robust Unlearning in LLMs(Preprint)

Aengus Lynch*, Phillip Guo*, Aidan Ewart*, Stephen Casper, Dylan Hadfield-Menell
Develops methodology and techniques for adversarially evaluating unlearning in LLMs.

Interesting/Funny Projects

A Compiler for a Functional Programming Language(Jul 2022)

Compiles a high-level functional language to C using continuation passing style.

A Theorem-Proving 'Extension' for Lua(Apr 2022)

An almost-usable proof assistant!

A Compiler for AQA Pseudocode to AQA Assembly(Mar 2021)

There was a question on my A-level asking me to manually compile to a simplified ARM assembly, so obviously the correct response was to write a compiler targeting it instead.

A Type Theory/Theorem Prover(Jan 2021)

A type theory which is technically usable as a proof checker, although I wouldn't recommend it.