How we build a hallucination-resistant RAG assistant over legislation
By Digit Steam Innovations
Some domains have no tolerance for a confident wrong answer. Legislation is one of them: an invented clause or a misread cross-reference isn't a quirk, it's a liability. When we build a retrieval-augmented (RAG) assistant over legal or regulated text, the whole design is organised around one rule — the assistant may only say what the source actually says, and it must show where it got it.
Why a basic chatbot fails here
Point a generic LLM at a pile of legal PDFs and you get fluent, plausible answers — some of which are wrong. Legal text is dense, heavily cross-referenced, and full of near-duplicate phrasing, so naive similarity search retrieves the wrong passage often enough to matter. And the model, asked to be helpful, will fill gaps. For a public-facing or staff-facing legal assistant that's unacceptable.
Step 1 — Hybrid retrieval, not just vectors
We combine two kinds of search and fuse the results. Keyword search (BM25) nails exact terms — article numbers, defined terms, statute names — where embeddings get fuzzy. Vector search catches meaning when the user's wording differs from the law's. We merge both with reciprocal rank fusion, so a passage that's strong on either signal surfaces. For legal text this single change removes a large class of “retrieved the wrong section” errors.
Step 2 — Understand the query, then rerank
- Query understanding: an LLM step rewrites and expands the question (synonyms, the formal legal term, the likely article) before retrieval.
- Reranking: the top candidates are re-scored by a stronger model against the actual question, so the few passages handed to the answer step are the genuinely relevant ones — not just the closest vectors.
Step 3 — A multi-layer anti-hallucination verifier
This is the part that makes it safe. Before any answer reaches the user, it passes through verification layers that check the generated answer against the retrieved source: is every claim supported by the cited text? Are the citations real and on-point? If a claim can't be grounded, the assistant doesn't guess — it narrows the answer, asks to clarify, or says it doesn't know. Every answer carries citations back to the exact passages, so a human can verify in seconds.
Step 4 — Language and domain tuning
Doing this in Greek (or any non-English legal corpus) adds another layer: embeddings, tokenisation and reranking all have to handle the language well, and the legal vocabulary has to be respected exactly. We tune retrieval and prompting for the specific corpus rather than relying on general-purpose defaults.
The takeaway
Trustworthy RAG over high-stakes text isn't one model and a prompt — it's a pipeline: hybrid retrieval, query understanding, reranking, and a verifier that refuses to ground-less guess. The same architecture applies wherever accuracy is non-negotiable: legal, compliance, medical, financial, internal policy. If you have a body of documents where a wrong answer is costly, this is how you build an assistant you can actually trust.
FAQ
Can you guarantee zero hallucinations?
No one honestly can. What we can do is make unsupported answers very unlikely and always verifiable: the assistant answers from retrieved source text, cites it, and is built to say “I don’t know” rather than guess. The verifier catches the cases a plain chatbot would invent.
Does this work for non-legal documents?
Yes. The same hybrid-retrieval + rerank + verifier pipeline applies to any high-stakes corpus — compliance manuals, contracts, medical or financial guidance, internal policy.
Related service: RAG Chatbots & AI Agents
Learn more