Here are two queries that return the exact same rows:
|
|
On a 10,000-row events table, both run in under a millisecond and nobody notices the difference. On a 200-million-row events table with an index on created_at, version A does a sequential scan and takes 45 seconds; version B does an index range scan and takes 12 milliseconds. Neither query is wrong. They don’t even disagree about the answer. One just does the same work in a way the planner can’t optimize.
The obvious fix is “rewrite every function-wrapped predicate as a range.” That works for the date-extraction case and a few others. For WHERE LOWER(email) = '[email protected]', the rewrite needs to know whether the column’s collation is case-insensitive — and if it isn’t, there’s no direct equivalent, only a functional index or a schema change. The fix depends on why the function is there, and “why” usually points back at something in the schema that’s pretending to be something it isn’t.
What SARGable means in practice
An index on created_at is a sorted structure: the engine can jump to any date range in O(log n) time by walking the B-tree. For the planner to use that index on a predicate, the predicate has to be expressible as “the column is in this range” — a direct comparison between the column and a constant or parameter.
created_at >= '2025-01-01' meets that contract. The planner translates it to “walk the index to the first entry ≥ 2025-01-01, read forward from there.” That’s a range scan.
YEAR(created_at) = 2025 doesn’t meet the contract. The value being compared isn’t created_at; it’s the output of YEAR() applied to created_at. The index on created_at doesn’t know the output of YEAR() for any row without computing it. So the planner falls back to evaluating the function on every row — a sequential scan — and only then filtering.
Common forms of the same mistake:
|
|
Three of the four non-SARGable forms have clean rewrites. The first one — LOWER(email) — depends on collation, which is where a lot of real-world cases live.
The collation case
WHERE LOWER(email) = '[email protected]' is almost always a tell that the email column has a case-sensitive collation and the application is hiding it at query time. Two real fixes, one cosmetic fix:
Fix the column. If the data should be matched case-insensitively, give the column a case-insensitive collation. In PostgreSQL that’s CITEXT or a COLLATE "und-x-icu" with the ICU provider; in MySQL it’s a _ci collation (which is usually the default anyway). Once the column’s collation handles the case folding, WHERE email = '[email protected]' is SARGable and fast. This is the right fix when case-insensitivity is a property of the data.
Add a functional (expression) index. If you can’t change the column’s collation — there’s a case-sensitive comparison elsewhere in the schema that depends on the current behavior — index the expression itself:
|
|
This works, with caveats. The index’s storage and write cost is real. The predicate has to match the indexed expression exactly — LOWER(email) is indexed, but UPPER(email) isn’t, and the planner won’t translate between them. Every non-SARGable expression you want fast needs its own index.
Cosmetic fix: case-fold at write time. Store the email as already-lowercased. WHERE email = '[email protected]' is now SARGable directly, no expression index needed. This usually requires application changes — whoever’s writing has to remember to case-fold — which is why the functional index is more popular even though it’s heavier. Where business logic lives covers the general shape of this decision; case-folding at the database with a generated column (GENERATED ALWAYS AS (LOWER(email)) STORED) is often the cleanest answer when the application can’t be trusted to normalize consistently.
Implicit type conversions are the subtler version
The function isn’t always in the query. Sometimes the planner is adding one:
|
|
MySQL will silently cast every account_id value to a number for comparison — a per-row function call that kills index usage just as effectively as an explicit CAST(). PostgreSQL is stricter and usually errors, but can still do implicit conversions between compatible types that undermine indexes.
The fix is matching types in both directions: the column type should be what the column is (a numeric ID should be BIGINT, not VARCHAR), and the query should write the literal in the column’s type (WHERE account_id = '12345' if the column is genuinely a string). Either fix works; matching the column type to the data’s real shape is usually the durable answer.
This is also where mixed PK strategies show up — joining a BIGINT id to a UUID id doesn’t just return wrong results; on MySQL it coerces one side to a string, which is the same implicit-function problem dressed up as a join.
When non-SARGable is acceptable
Not every non-SARGable predicate is a bug. Three cases where it’s fine:
Small tables. A 5,000-row lookup table with a function-wrapped predicate scans in microseconds. The planner isn’t going to use an index on that size anyway. WHERE UPPER(code) = 'NY' on a 50-row states table is not worth worrying about.
One-off analytical queries. A one-time data extract that scans a large table is going to scan it regardless. If the query will never run again, the function call isn’t the bottleneck — the table size is — and adding a functional index to optimize one query isn’t worth the write cost on every future insert.
When the function genuinely can’t be avoided. Some predicates legitimately need to compute. WHERE haversine_distance(lat, lng, user_lat, user_lng) < 10 on a geospatial query can’t be rewritten as a simple range; you need a spatial index (PostGIS, MySQL spatial extensions) to make it SARGable in the geometric sense. The fix is a different kind of index, not a rewrite.
The bigger picture
Non-SARGable predicates are easy to write, and they come from somewhere — almost always a schema decision that’s being papered over at query time. LOWER(email) hides a collation mismatch. CAST(price AS INT) hides a type that should have been NUMERIC from the start. DATE(created_at) hides the fact that the query is answering a date-range question but written in a way that reads more naturally as an equality. Every one of these is a query-level workaround for a schema-level issue, and every one of them costs an index when the table grows large enough to care.
EXPLAIN ANALYZE is the diagnostic. If the plan shows a sequential scan on a predicate that should hit an index, the predicate is almost certainly non-SARGable — look at what’s wrapping the column. Fix the schema if you can, add a functional index if you can’t, and treat non-SARGable predicates on hot paths as latent performance bugs, not style issues.