OpsIntermediateObservabilityMonitoringAIOps

AI Observability & AIOps

Monitoring AI Systems Before They Fail

Most AI systems silently degrade — drift, regressions, cost spikes, latency spikes. This workshop is the AIOps playbook for catching them before customers do.

Duration: 2 hours
Mode: Live online
Audience: CTOs · Engineers · Infra engineers
Schedule: Quarterly · next dates announced via newsletter

01 · What you'll learn

Concrete outcomes by the end

Wire the four signals every AI system needs (cost, latency, quality, drift)
Detect quality regressions before users complain
Catch cost runaways inside the same hour they start
Build a per-request audit trail without leaking PII
Page on the right thing — not the noise

02 · Agenda

What we cover

The four signals
Hour 1
Cost, latency, quality, drift — how to instrument each without instrumentation regret.
Quality + drift
Hour 1
Online and offline evals, golden-set regression, drift detection.
Cost + ops
Hour 2
Per-tenant cost ledgers, anomaly alerts, budget gates.
Alerting that doesn't burn
Hour 2
Page on outcomes, not metrics. SLO patterns for AI systems. Runbook templates.

03 · Who should attend

The right audience

CTOsEngineersInfra engineersFounders

04 · Prerequisites

Come prepared

You operate at least one AI service in production
Familiar with logging / metrics tooling (any vendor)

05 · Speaker

Hosted by

Pankaj Kharkwal

Founder, Pankh AI

Pankaj builds production AI systems for businesses and runs Pankh AI. He has shipped agents, RAG pipelines, and observability stacks for companies that needed AI to actually work — not just demo.

06 · Outcomes

Why people attend

After this workshop you leave with a concrete artefact you built live and a playbook you can use the next week. Cohort chat stays open so you can ask follow-up questions while you ship.

07 · FAQ