EngineeringAdvancedRAGSearchRetrieval

RAG in Production

What Actually Breaks at Scale

RAG is easy to demo and brutal to operate. This workshop covers the failure modes nobody mentions in tutorials — chunking, retrieval drift, freshness, evaluation, and cost.

Duration: 2 sessions × 90 min
Mode: Live online
Audience: Developers · AI engineers · Backend engineers
Schedule: Next cohort opens soon · join the waitlist

01 · What you'll learn

Concrete outcomes by the end

Chunking strategies that actually survive real corpora
Pick the right retrieval pattern: vector, BM25, hybrid, or rerank
Build a retrieval eval set — and the metrics that matter
Handle freshness, deletions, and multi-tenant isolation
Cut RAG cost 5–10× without losing answer quality

02 · Agenda

What we cover

Ingestion + chunking
Session 1
Token-aware chunking, structural chunking, hybrid splits. When to recompute.
Retrieval patterns
Session 1
Vector, BM25, hybrid, rerank. Choosing per query class.
Eval + drift
Session 2
Build a retrieval-eval golden set, detect drift, and ship updates without regressions.
Cost + ops
Session 2
Caching, prefiltering, cheaper embeddings, and tenant isolation.

03 · Who should attend

The right audience

DevelopersAI engineersBackend engineers

04 · Prerequisites

Come prepared

Comfortable with Python
You have or are planning a RAG system
Familiar with at least one vector DB

05 · Speaker

Hosted by

Pankaj Kharkwal

Founder, Pankh AI

Pankaj builds production AI systems for businesses and runs Pankh AI. He has shipped agents, RAG pipelines, and observability stacks for companies that needed AI to actually work — not just demo.

06 · Outcomes

Why people attend

After this workshop you leave with a concrete artefact you built live and a playbook you can use the next week. Cohort chat stays open so you can ask follow-up questions while you ship.

07 · FAQ