MultiHub Forum

Full Version: How can I keep a fine-tuned model from going brittle with rag?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
So I’ve been trying to fine-tune a small model on some specific documents for a side project, and I’m hitting a wall where it just seems to lose all common sense about anything outside that text. Has anyone else run into this, where the model gets hyper-specialized but weirdly brittle? I’m starting to wonder if the whole approach of retrieval-augmented generation is the only practical way to keep it useful.
Yeah that sounds brutal. I have tried a tiny model on a narrow doc set and it went all in on the pages and forgot the world outside. It feels smart about the text but brittle when asked for general sense. The frustration is real.
One frequent reason is that the model overfits to the fine tuned material and loses generalization. When you add retrieval augmented generation the external fetch can push the answer into the retrieved text instead of reasoning, which makes it brittle. If your retrieval is too aggressive or the prompts push toward the doc content you end up with hollow confidence.
Maybe you need more data in a broader style rather than less and keep the docs as a memory layer. Or perhaps you are trying to force the model to perform in a way that only the text would allow and ignoring the rest of the domain.
I am skeptical that any one trick fixes this. The problem might be evaluation or prompt design rather than the approach. The claim that retrieval is the only practical path can feel overstated.
Maybe the frame itself is the issue. Instead of chasing a flawless assistant the goal could be to design this as a tool that knows when to fetch and when to ask for human input.
From a writing craft lens the pace and voice matter. The doc driven pull can remove risk and cadence. Try prompts that nudge the model to speak in plain terms and to flag uncertainty rather than delivering instant confidence.
Consider a hybrid setup with adapters or a memory layer that can be tuned separate from the base model. A small local memory plus a generic engine can switch modes and keep both specificity and general sense alive.