Untitled

As I enjoy watching models explain their reasoning through work I’ve given them, I keep thinking about the recent paper _Chain-of-Thought Is Not Explainability and how much we should remind people that rationalization is not always insight:

verbalised chains are frequently unfaithful, diverging from the true hidden computations that drive a model’s predictions, and giving an incorrect picture of how models arrive at conclusions

I posted this in July 2025 during week 2679.

For more, you should follow me on Bluesky.