RAG Knowledge Base Assistant
Internal assistant answers policy/tech questions with citations
Employees get instant answers
From 4 hours to 1.5 hours
The Problem
A 200-person company was drowning in internal support requests. IT and HR teams received 300+ tickets monthly asking about policies, procedures, and technical configurations. Knowledge was scattered across Notion, Google Docs, and tribal knowledge.
Response times averaged 4 hours, and 60% of questions were repetitive. The team needed a self-service solution that actually worked.
Technical Approach
Built a Retrieval-Augmented Generation (RAG) system that acts as an AI-powered knowledge base assistant.
System Architecture
┌─────────────┐
│ Data Sources│
│ (Notion, │
│ Docs, PDFs)│
└──────┬──────┘
│
▼
┌─────────────┐
│ Ingestion │
│ Pipeline │ ← Playwright scraper + PDF parser
└──────┬──────┘
│
▼
┌─────────────┐
│ Chunking + │
│ Embeddings │
└──────┬──────┘
│
▼
┌─────────────┐
│ Weaviate │
│Vector Store │
└──────┬──────┘
│
▼
┌─────────────┐
│ Hybrid │
│ Search │ ← BM25 + Vector + Reranking
└──────┬──────┘
│
▼
┌─────────────┐
│ GPT-4 │
│ Generator │ ← With citations
└──────┬──────┘
│
▼
┌─────────────┐
│ Answer │
│ + Sources │
└─────────────┘
Key Features
- Automated Ingestion: Playwright-based scraper pulls from Notion, Google Docs, PDFs
- Hybrid Search: Combines keyword (BM25) and semantic (vector) search for better recall
- Answer Reranking: Cross-encoder model reranks results before generation
- Citations: Every answer includes source documents with page numbers
- Guardrails: Refuses to answer when confidence is low or question is out of scope
Implementation Details
Chunking Strategy
- Semantic chunking based on document structure (headings, sections)
- 500-800 token chunks with 100 token overlap
- Metadata preservation (title, last updated, author, department)
Retrieval Pipeline
- Query understanding and expansion
- Parallel BM25 and vector search
- Reciprocal Rank Fusion (RRF) for result merging
- Cross-encoder reranking of top 10 candidates
- Top 3-5 chunks sent to LLM
Quality Assurance
- Eval set: 50 golden questions with expert answers
- Metrics: Answer relevance, citation accuracy, hallucination rate
- A/B testing: New prompt/retrieval strategies tested against baseline
Business Impact
After 6 months:
- 45% ticket deflection - employees find answers instantly
- 60% faster resolution - remaining tickets answered faster with better context
- $85K annual savings - reduced support team workload
- 98% accuracy on eval set - maintains high quality
Lessons Learned
What Worked
- Hybrid search outperformed vector-only by 23% on our eval set
- Citations build trust - users validate answers against sources
- Semantic chunking better than fixed-size chunks for our docs
What Didn’t
- Notion API rate limits - had to implement caching and incremental updates
- PDF tables - required custom parsing logic for structured data
- Generic embeddings - fine-tuned embeddings improved relevance by 15%
Future Improvements
- Add conversational follow-up questions
- Implement feedback loop for continuous learning
- Expand to external customer support
- Multi-modal support for images and diagrams
This system transformed internal knowledge management from a bottleneck into a competitive advantage.
Technical Architecture
- Ingestion pipeline (web, PDFs)
- Hybrid search (BM25 + vector)
- Answer re-ranking
- JSON logging