Mechanistic Interpretability in Practice: Applying TDA to Breast Cancer

This paper, which we shared last week, is a demonstration of how topological data analysis methods can be used for feature engineering and selection for data sets where there are many features, or columns in the data matrix.

In our Less Wrong post we pointed out that when a data set is “wide”, i.e. includes a large number of features, it is useful to compress the feature set into a graph structure where each node corresponds to a set of features.  This gives an overview of what otherwise might be a feature set which is very hard to understand.  Each data point can also be treated as a function on the nodes of the corresponding graph, so that one can examine and compare data points or collections of data points by “graph heat maps”, i.e. colorings of the nodes representing the functional values.

Mechanistic interpretability serves as the unifying macro concept here, as both sources leverage feature compression and graph-based visualization to transform high dimensional, opaque systems into interpretable structures that reveal underlying relationships. This paper studies a particular wide data set, where the features correspond to genes, with the entries being gene expression levels.  Breast cancer is divided into a number of subtypes, and this paper illustrates both how one can refine the quantitative distinctions between the existing groups as well as recognize and characterize in-group variation.

This kind of feature analysis applies equally well to features coming out of neural networks, and particularly to feature sets derived from LLMs, such as sparse autoencoders (SAEs) and cross layer transcoders (CLTs).  We demonstrated this in the above post, and also in another Less Wrong post.

Ready to try it? Go to BluelightAI.com/Cobalt

Next
Next

Topological Data Analysis Reveals a Subgroup of Luminal B Breast Cancer