As I enjoy watching models explain their reasoning through work I’ve given them, I keep thinking about the recent paper _Chain-of-Thought Is Not Explainability and how much we should remind people that rationalization is not always insight:
verbalised chains are frequently unfaithful, diverging from the true hidden computations that drive a model’s predictions, and giving an incorrect picture of how models arrive at conclusions
I posted this in July 2025 during week 2679.
For more, you should follow me on the fediverse: @hans@gerwitz.com