ThoughtTrim: Anchor-Driven RL Modification

Large Language Models (LLMs) often “think” in long chains of reasoning: verbose step-by-step outputs known as chain-of-thought (CoT). However, not every sentence in those CoTs actually matters for getting to the correct answer. In many cases, only a few key steps (what we call “thought anchors”) significantly influence the final result. The rest might be redundant or “fluff”. ThoughtTrim is a project born from this realization: What if we could trim down a model’s reasoning to just the essential parts? This would make the AI’s reasoning process more efficient (saving time and tokens) and potentially more interpretable and reliable.

The Idea of ThoughtTrim

ThoughtTrim uses a simple yet clever approach: run the AI’s reasoning and identify which sentences are most important. We do this by measuring the impact of each sentence on the final answer. Concretely, for each sentence in the chain-of-thought, we compare the AI’s answer distribution with that sentence vs. without it. If removing a sentence changes the answer (or the confidence in answers) a lot, that sentence has high importance (high “counterfactual influence”). If the answer stays pretty much the same, the sentence was likely not critical. The technical tool we use is KL divergence – essentially a way to quantify how different the answer probabilities are in those two scenarios. That gives us a ranked list of sentences by importance (the “anchors”).

How Trimming Works

Once we have the importance ranking, we can start pruning the less important sentences from the reasoning chain. We then feed the trimmed chain-of-thought back into the model and see if it still arrives at the correct answer. By gradually increasing the strictness of trimming (removing more and more, keeping only higher-ranked anchors), we test the limits of how much we can cut while retaining accuracy.

We tried this on a set of 100 biology questions (from the MMLU-Pro benchmark). For each question, the model (Qwen-1.5B, a smaller LLM) generated a detailed chain-of-thought and an answer. We then applied ThoughtTrim to see which parts of the reasoning were truly needed. The results were fascinating:

For some questions, ThoughtTrim could remove 60-90% of the tokens in the explanation and the model still got the answer right. This means the model was basically thinking out loud a lot of unnecessary stuff!
Other questions were more fragile; even a small removal caused the answer to flip. Those cases are interesting too, because they tell us the model might be just barely reasoning correctly, and any change risks an error.
On average, as we removed more and more fluff, the accuracy gently dropped. But importantly, our method of removing the least important parts first performed much better than random removal. We even did a control test where we jumbled the importance rankings (to simulate random trimming) and that approach led to faster accuracy drop-offs. It proved that our importance metric was meaningful – we were truly identifying the expendable parts of the thought process.

Why This Matters

ThoughtTrim is more than a neat efficiency hack. It has implications for AI safety and transparency:

Efficiency: In production, if an AI can reason with fewer tokens, it’s faster and cheaper (important for large models where each token costs compute). Imagine a future where an AI not only gives you an answer but also a slimmed-down reasoning or explanation, saving on resources.
Determinism: LLMs can be unpredictable in how they reason. By enforcing a focus on critical steps, we might guide models to reason in more reliable ways. One vision is to train future models to naturally generate only the important steps – making them less verbose and less prone to getting lost in their own reasoning.
Interpretability: A trimmed chain-of-thought that’s still correct is easier for a human to read and verify. It’s like getting a concise proof of a math problem rather than a 5-page ramble. This could help in debugging model errors or understanding model decisions.

Next Steps & Future Work

ThoughtTrim was initially developed during a weekend AI research sprint, so there’s plenty of room to expand:

We want to test it on bigger models and more diverse problems. Does a huge model like GPT-4 also have so much fluff in its reasoning? Possibly, and trimming it could yield big gains.
We’re exploring using ThoughtTrim in training: e.g. reinforcement learning where the model gets rewarded for following an “anchor plan” (the key steps) and penalized for unnecessary detours. Over time, the model could learn to think more like an expert reasoner, focusing only on what matters.
Another idea is integrating ThoughtTrim into AI agents – for example, if an AI is performing a multi-step task (like a web assistant agent), ThoughtTrim could serve as a watchdog to cut off unproductive tangents and keep the agent on track.
Lastly, from a safety standpoint, trimming could remove parts of reasoning that might be risky or disallowed, while keeping the solution. For instance, if a chain-of-thought veered into sensitive content but that wasn’t needed for the final answer, trimming might make the output safer. This is speculative but worth researching.

Conclusion

The ThoughtTrim project showed that brevity can be the soul of wit – even for AI. By identifying and keeping only the “thought anchors,” we can make AI reasoning more efficient and possibly more trustworthy. Watching the model still get answers right with a fraction of its original reasoning is like watching a student solve a problem in 2 steps instead of 10 – it’s a sign of mastery. We’re excited to push this idea further and see how it can contribute to building AI systems that are not just smart, but also efficient and safe.

(Authored by Josh Rauvola and Andrew Briand as part of Apart Research’s hackathon, 2025.)