About Me

I am a researcher and engineer transitioning from building decentralized systems to the field of AI safety, with a deep focus on mechanistic interpretability (MechInterp). This move is a natural evolution of my core technical interests.

My academic journey has been dedicated to the formal understanding of computational systems. I hold a Master's degree in Computer Science, where my thesis explored modal logics, followed by a PhD in the formal verification of programs focused on creating mathematical proofs of software correctness. I see a powerful parallel between ensuring a traditional program is safe and the central challenge of MechInterp: reverse-engineering the internal 'circuits' of neural networks to verify their behavior.

Previously, I applied these principles of security and rigor at a massive scale while working on the core infrastructure of Polkadot. There, I had the privilege of working alongside Ethereum co-founder Gavin Wood to build one of the world's most complex blockchain protocols. That experience in deconstructing and securing large-scale systems directly informs my approach to understanding the intricate architectures of modern AI models.

Research Interests

My primary interest lies in identifying and understanding the 'circuits' within transformer models responsible for specific behaviors. I am particularly interested in:

  • Induction heads and the mechanisms of in-context learning.
  • Techniques for automated circuit discovery and verification.
  • Superposition and the challenges of polysemantic neurons.

Contact

You can reach me via email or follow my work on GitHub.

© 2025 Marcio Diaz.