14 Years of Writing. One AI System. Here's What It Found.

There is no shortage of articles telling you what RAG is.

This is not one of them.

This is what happens when you actually build one — over real content, with real questions, and an honest evaluation of what works and what does not.

What Is RAG and Why Should You Care?

RAG stands for Retrieval Augmented Generation.

The idea is simple. Before answering your question, the AI searches your documents, finds the most relevant pieces, and uses those to construct a grounded answer — not a generic one.

Think of it as the difference between asking a colleague who has read your internal policy documents versus asking someone who has only read the internet.

Most enterprise AI assistants being deployed right now use this approach. Internal knowledge bots, policy assistants, code documentation tools — RAG is underneath most of them.

What I Built

I wanted to understand what enterprises actually face when they deploy these systems. The only honest way to do that was to build one myself and stress-test it.

So I built a RAG system over 14 years of my own content — 111 blog posts from agilebuddha.in and my book on Agile estimation. Then I ran real questions against it and documented every failure.

The technology itself took eight focused sessions of 60-90 minutes each. It is not rocket science.

What came after building it — that is where it gets interesting.

What I Found

Finding 1: The hardest part is not the technology. It is the content.

I assumed that organised content — including a published book — would be sufficient from a data quality perspective.

I was wrong.

I had written extensively about Behaviour Driven Development. When I asked my system about Specification by Example — BDD’s synonym — it returned nothing.

One concept. Two names. Zero retrieval.

The system was not broken. The content had not anticipated how someone else would ask the question.

This is not a RAG problem. This is a documentation culture problem.

Finding 2: Implicit knowledge lives in heads, not documents.

The most surprising moment in the entire build — I asked a basic question I expected the system to answer easily.

It said it did not know.

The question was not obscure. It was the kind of thing a beginner asks first. But nobody had written it down — because everyone who knew the answer considered it obvious.

This is the silent failure mode of enterprise knowledge systems. The questions that matter most to new employees, customers, and AI systems — they are often the least documented. Because the people with the answers stopped seeing them as questions worth answering.

Finding 3: 58% of questions returned weak retrieval — despite 111 posts and a full book.

That number stopped me.

Not because the technology failed. Because the knowledge was not there in the right depth, the right structure, or the right accessibility.

Volume of content is not the same as depth of knowledge.

Most organisations measure their knowledge management by document count. RAG exposes that quantity and depth are completely different things. Thousands of

Confluence pages may represent genuine expertise in only a handful of areas.

Finding 4: The corpus becomes a mirror.

When I asked about CIO AI governance, the system returned my one AI article three times.

Because that was all there was.

RAG does not just retrieve knowledge. It makes the shape of your knowledge visible — including its edges and gaps.

Deploying RAG in an enterprise is an involuntary knowledge audit. You will find out quickly what you actually know versus what you assumed you had documented.

What This Means for Enterprises

RAG implementation is less about technology and more about organisational culture and information architecture.

If your information is fragmented, inconsistently structured, and built from the content creator’s perspective rather than the user’s — your RAG system will reflect that faithfully.

The initial RAG deployment is not the end of the journey. From there, the real work begins: understanding what people are actually searching for, identifying the gaps, and building or restructuring content to serve real questions — not assumed ones.

Before you invest in the model or the pipeline, ask yourself one honest question:

If a new employee joined tomorrow with access only to your documents and no access to any person — how well would they understand how your organisation actually works?

That answer tells you your RAG readiness more accurately than any vendor assessment.

The Code

Full codebase, step-by-step instructions, and documented observations are here:
👉 github.com/vashishthask/building-rag-on-your-own-content

What would you find if you built RAG over your organisation’s internal knowledge?