BluelightAI Research Fellowship

At BluelightAI we believe that understanding how AI models work will be a key factor in ensuring that these models benefit humanity. We’re using topological data analysis and mechanistic interpretability to get insights into models’ internal functioning, and building tools to leverage those insights in real-world scenarios. Some things we’ve been working on recently include training cross-layer transcoders for Qwen 3, using CLT/SAE features to train interpretable classifiers, and using SAE features to investigate patterns in model performance.

We’re excited to open a number of research fellowship positions for students, postdocs, or others who are interested in getting deeper into mechinterp and TDA. These will be high-velocity collaborations with BluelightAI team members to make discoveries about how AI models work. These will be remote collaborations—applicants from all around the world will be accepted. If you have experience with TDA, LLM training, or mechanistic interpretability, we’d love to work with you!

Scope:

  • Research projects lasting from 1-3 months, leveraging topological data analysis and mechanistic interpretability to answer questions about large language models. We expect projects to involve 10-20 hours of work per week.

  • What we’ll provide:

  • A one-time research stipend of $5000

  • At least weekly 1-1 mentorship meetings to advance the research project

  • Access to compute resources

  • Early access to Cobalt (our TDA toolkit) and our mechanistic interpretability tooling

  • Each project will at minimum produce a blog post that will be shared on our website and LessWrong, and we anticipate that many projects will develop into publishable research papers.

Application process:

You’ll need to provide a CV/resume, a personal statement, a short (1-2 paragraph) proposal for a research project (see project ideas for inspiration), and references to any related work you’ve done previously (including informal things like blog posts). We’ll have one or two interviews, and if you’re selected, we’ll flesh out a research plan and begin ASAP. Applications will be taken on a rolling basis, based on applicant quality and our current capacity. We expect to have 3-4 participants in our first batch of fellowships.

Project ideas

  • Here are a few things we’ve been thinking about that might serve as inspiration for your proposals.

  • Use Cobalt to develop a thorough taxonomy of features for one of our cross-layer transcoder models

  • Identify how different LLMs differ in “vibes” or capabilities using libraries of features from SAEs or other interpreter models

  • Develop generalizations of sparse autoencoders that take into account feature geometry

  • Help automate circuit tracing by developing techniques to automatically group features into “supernodes” of related features

  • Improve feature autointerpretation by incorporating information about related features

  • Fine-tune SAEs or CLTs on domain specific data to investigate particular behaviors in more detail

  • Mechanistic investigation of prompt injections or jailbreaks

  • Search for feature manifolds like those uncovered in When Models Manipulate Manifolds

  • Analyze the latent space evolution of reasoning chain-of-thought

  • Investigate components of models with some degree of built-in sparsity (e.g. expert routing, MLP gating, LoRA adapters) to identify interpretable patterns

  • Investigate a specific model capability of interest, e.g. basic arithmetic, tracking parts of speech, maintaining indentation/nesting state in code

  • Develop techniques for engineering new features and injecting them into AI models

Apply here

Questions? Email jakob@bluelightai.com


Next
Next

Circuit tracing with the Qwen3 Cross-layer Transcoders