A team's AI ops agent has access to logs, metrics, deploys, traces. Six months in, MTTR is unchanged and 'responder trusted the summary' shows up in three of the last ten post-mortems. More access did not make the summaries more correct.
Eleven all-caps 'do not touch production' messages. An agent that touched it anyway. 1,206 dropped executive accounts. Adding a twelfth message would not have helped.
An assistant doesn't see pg_stat_user_indexes. It doesn't see the existing index list, the write/read ratio, or any record that an identical index was tried last quarter and quietly dropped. It sees the schema and the query in front of it - the same evidence every redundant and unused index in your catalog was shipped on.
Try making Claude Code stop saying 'honestly'. Put it in CLAUDE.md, write a skill, set a system reminder, drop it into project memory - the model still says 'honestly'. That's the smallest reproducible demonstration that the prompt sits on top of a system you cannot actually override.
Two weeks into the sprint, the velocity report is being prepared. The team committed to 42 points. They closed 28. The other 14 are the work that actually arrived. The estimate was a polite fiction the whole time.
Sprint after sprint, the team commits to 40 points and delivers 35. The diagnosis is always 'unforeseen work.' The data is silent because nobody re-points the tickets against what actually happened.
Six months of rotation. The runbook count hasn't grown. The same three SMEs get pinged for the same three classes of incident. Zero improvement tickets filed this quarter. The rotation is on the calendar; the silo-breaking is not happening.
Six months in, p_future holds 800M rows because the growth projection didn't survive the workload, and every ALTER to fix it needs a maintenance window nobody wants to schedule. The boundary management is two lines of DDL; the harder part is picking a partition key that doesn't leak into application code.
It's 3pm. An engineer reads back this morning's declaration, names what's stalled, gets a concrete fix from a teammate two minutes later, and goes back to their desk with three hours of work day left. A morning standup could not have produced that conversation.
Squawk catches the locking ALTER. pgTAP catches the missing UNIQUE. Testcontainers with a prod-shaped snapshot catches the migration that takes 40 minutes against real volume. Five categories of database test, each invisible to the others, each addressed by tools that have been stable for years and that almost no team has assembled into one suite.