Featured image of post Covering Index Traps: When Adding One Column Breaks Your Query

Covering Index Traps: When Adding One Column Breaks Your Query

A query runs in 0.4 ms for a year — an index-only scan with heap fetches at zero. A feature request adds one column to the SELECT list. The next day the same query takes 1243 ms. Nothing else changed. The index is the same, the filter is the same, the data is the same. The select list just stopped being covered.

TL;DR
An index-only scan is the fastest way a relational database can answer a query — the engine reads the index and never touches the table. Adding a single column to the SELECT list that isn’t in the index silently breaks that optimization, and the query that ran in a millisecond now takes seconds on the same data. The fix isn’t “never SELECT extra columns” — it’s knowing that the SELECT list is part of the query’s performance contract with the index.

Here’s a query that ran in production for a year with sub-millisecond latency:

1
SELECT status, created_at FROM orders WHERE customer_id = 42;

The orders table has a composite index on (customer_id, status, created_at). Every column the query needs — customer_id for the filter, status and created_at for the output — is in that index. The database reads the index, returns the results, and never touches the table. This is an index-only scan: one of the most significant optimizations a relational engine makes, and the mechanism behind “covering” queries.

Then a feature request: “show the order total on this page.” The change looks trivial.

1
SELECT status, created_at, total_cents FROM orders WHERE customer_id = 42;

One column added. The query is still correct. The index still matches the filter. But total_cents isn’t in the index — so for every matching row, the engine now follows a pointer back to the table to fetch that one extra column. On a table with millions of rows, that’s a random I/O per match. The query that was 0.4 ms is now 1243 ms.

The obvious fix is “just don’t add columns to queries.” That doesn’t work — features need data. The slightly-less-obvious fix is “always project the minimum columns,” which is fine as advice and ignored in practice because every ORM defaults to SELECT *. The actual fix is to treat the SELECT list as part of the query’s performance contract with the index, and to know what that contract is before changing it.

What’s actually happening

The execution plan tells the whole story:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Before: index-only scan
EXPLAIN ANALYZE SELECT status, created_at FROM orders WHERE customer_id = 42;
-- Index Only Scan using idx_orders_cust_status_created on orders
-- Heap Fetches: 0
-- Execution Time: 0.4 ms

-- After: index scan + table lookups
EXPLAIN ANALYZE SELECT status, created_at, total_cents FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_cust_status_created on orders
-- Execution Time: 1243.7 ms

Same index. Same filter. Same rows returned. The only difference is the select list, and it moves the query from a pure index walk to an index walk plus one random I/O per matching row.

Buffer pool pollution compounds the damage. When the engine fetches full rows from the table instead of reading compact index entries, it loads entire data pages into the buffer pool. Those pages — carrying every column of every matched row, most of which the query doesn’t need — evict pages that other queries do need. On a busy system with a finite buffer pool, one query losing its covering index degrades performance for unrelated queries across the database. The slow query you noticed is rarely the only thing getting slower.

Nothing in the query results tells you. The rows come back correctly. The response looks the same. A SELECT COUNT(*) returns the same count. The only place the degradation is visible is in the execution plan — and nobody checks the execution plan when the feature ships.

ORM defaults
Most ORMs emit SELECT * unless explicitly told otherwise. ActiveRecord needs .select(:id, :status); Django needs .only('id', 'status'); SQLAlchemy needs explicit column specification; Prisma needs an explicit select block. On a high-traffic table, a one-line change to project only the needed columns is one of the highest-leverage optimizations available. Worth checking what your ORM actually generates on the query paths that matter — the generated SQL is the contract, not the method call.

The fix: match the index or extend it

There are two workable fixes when a query loses its covering index, and they trade different costs:

Project only what the index covers. If the new column isn’t worth fetching from the table on every row, don’t fetch it. Split the query: one covered query for the list view, a targeted lookup for the detail row the user actually wants. Most feature requests that “need” an extra column on a list page are actually fine with lazy-loading the value on click.

Extend the index to include the new column. If the column is genuinely needed on every row, add it to the index — either as an additional indexed column or (in PostgreSQL) as an INCLUDE clause that adds the value to the leaf pages without making it part of the B-tree ordering:

1
2
3
4
-- PostgreSQL: add total_cents as a non-key included column
CREATE INDEX idx_orders_cust_status_created_total
    ON orders (customer_id, status, created_at)
    INCLUDE (total_cents);

INCLUDE is the right tool when you need the column covered but don’t want it affecting the sort order or filter path. The trade-off is write cost: the index is now larger, and every update to total_cents has to update the index entry. On a write-heavy table that’s meaningful; on a read-heavy table it’s usually negligible compared to the read speedup.

MySQL (InnoDB) doesn’t support INCLUDE but has a natural equivalent: every secondary index already contains the primary key at its leaves, and you can extend the secondary index to cover additional columns by adding them as regular key columns. The planner is smart enough to use the covered form when the column is present.

When covering isn’t the right call

Covering indexes aren’t a universal good. Three cases where chasing a covering index is the wrong move:

Low-selectivity filters. If customer_id = 42 matches 80% of the table, the planner won’t use the index at all — a sequential scan is cheaper. Index-only scans matter when the filter is selective. On a low-selectivity predicate, covering changes nothing.

Write-heavy tables. Every index slows writes. A table taking 50,000 inserts per second with five secondary indexes already pays a real cost for every index entry. Adding a covering variant of an existing index to shave read latency from 15 ms to 3 ms is a bad trade if the table is write-dominated — the write penalty compounds on every row, and only the reads benefit.

Rapidly changing projections. If the feature team is adding and removing columns from the list view every sprint, chasing the covering index is a losing game. Freeze the list-view columns as a contract, document them in the schema, and let the index match that contract — or don’t bother indexing for coverage at all.

One more column, silently uncovered

The archetypal AI-generated version of this bug is a one-line change that adds a column to the SELECT list. A feature request says “show the order total on the list page”; the assistant reads the existing query, adds total_cents to the projection, and returns the patch. The query still runs, the list page still renders, and the p99 quietly moves from 0.4 ms to 1200 ms because the index-only scan became a heap-fetch scan and nobody noticed until the dashboard did.

Coverage checking itself isn’t hard reasoning — given a query and an index definition, working out whether the SELECT list stays inside the index is a short syntactic check any capable model can do. The catalog exposes the ingredients: PostgreSQL’s pg_index separates key columns from INCLUDE ones via indnkeyatts vs indnatts, MySQL’s information_schema.STATISTICS lists all columns per index. The signal is there. What fails in practice is subtler. The relevant index often isn’t in the prompt’s context window — schema-aware tools pull catalog metadata, but whether idx_orders_cust_status_created lands in the retrieved context for “add total_cents to the list view” depends on retrieval heuristics, not the model’s capability. Even when the index definition is available, the default behavior for “modify this query” is to modify the query; re-verifying that the projection stays covered is a step the assistant rarely takes unsolicited. And only the planner’s actual choice is authoritative — static analysis gets most of the way, but nothing short of EXPLAIN tells you which index the query will use under real statistics.

The fix at the schema level is what makes the coverage relationship legible to the next reader, human or model: name indexes after the query they support (idx_orders_list_view tells you what depends on it), document INCLUDE columns in the index comment, and put a comment on the query itself pointing at the index. None of this is novel advice. It becomes load-bearing once an assistant is routinely modifying queries — the explicit link between query and covering index is the signal that tells the assistant (and the human reviewer) “this change has an index implication” rather than silently shipping the uncovered patch.

The bigger picture

The SELECT list is a performance contract in most code reviewers’ blind spot. WHERE clauses get scrutinized because they’re obviously performance-relevant. JOINs get scrutinized because cardinality mistakes are visible. The SELECT list gets waved through because “it’s just what we display” — and then a one-column addition drops a query from 0.4 ms to 1243 ms with no code-review signal to catch it.

EXPLAIN ANALYZE is the only authority here. Reading execution plans isn’t glamorous, but it’s the difference between a query that works and a query that works at scale — and between a select-list change that’s free and one that silently broke the optimization the index existed to enable. On the queries that carry the most traffic, the execution plan belongs in code review alongside the query itself.

SELECT insights FROM experience WHERE downtime = 0; -- Ruslan Tolkachev