Featured image of post Non-SARGable Predicates: How a Function in WHERE Kills Your Index

Non-SARGable Predicates: How a Function in WHERE Kills Your Index

`WHERE YEAR(created_at) = 2025` scans every row in the table. `WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'` does an index range scan. Both return identical rows; one is orders of magnitude faster. The difference is a single function call that the query planner can't see past.

TL;DR
A predicate is SARGable — Search ARGument able — if the database can use an index to evaluate it. Wrapping a column in a function makes the predicate non-SARGable: the engine has to compute the function on every row before it can filter, which means a full table scan no matter what indexes exist. The fix isn’t always to rewrite the predicate — sometimes the column’s type or collation is wrong and the code is masking it — but every non-SARGable predicate on a hot path is a performance bug waiting for the table to grow.

Here are two queries that return the exact same rows:

1
2
3
4
5
6
7
-- Version A
SELECT id, status FROM events
WHERE YEAR(created_at) = 2025;

-- Version B
SELECT id, status FROM events
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';

On a 10,000-row events table, both run in under a millisecond and nobody notices the difference. On a 200-million-row events table with an index on created_at, version A does a sequential scan and takes 45 seconds; version B does an index range scan and takes 12 milliseconds. Neither query is wrong. They don’t even disagree about the answer. One just does the same work in a way the planner can’t optimize.

The obvious fix is “rewrite every function-wrapped predicate as a range.” That works for the date-extraction case and a few others. For WHERE LOWER(email) = '[email protected]', the rewrite needs to know whether the column’s collation is case-insensitive — and if it isn’t, there’s no direct equivalent, only a functional index or a schema change. The fix depends on why the function is there, and “why” usually points back at something in the schema that’s pretending to be something it isn’t.

What SARGable means in practice

An index on created_at is a sorted structure: the engine can jump to any date range in O(log n) time by walking the B-tree. For the planner to use that index on a predicate, the predicate has to be expressible as “the column is in this range” — a direct comparison between the column and a constant or parameter.

created_at >= '2025-01-01' meets that contract. The planner translates it to “walk the index to the first entry ≥ 2025-01-01, read forward from there.” That’s a range scan.

YEAR(created_at) = 2025 doesn’t meet the contract. The value being compared isn’t created_at; it’s the output of YEAR() applied to created_at. The index on created_at doesn’t know the output of YEAR() for any row without computing it. So the planner falls back to evaluating the function on every row — a sequential scan — and only then filtering.

Common forms of the same mistake:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Non-SARGable: function on column → full scan
WHERE LOWER(email) = '[email protected]'
WHERE DATE(created_at) = '2025-01-15'
WHERE CAST(price AS INT) > 100
WHERE CONCAT(first_name, ' ', last_name) = 'Alice Smith'

-- SARGable equivalents
WHERE email = '[email protected]'              -- if collation is case-insensitive
WHERE created_at >= '2025-01-15' AND created_at < '2025-01-16'
WHERE price > 100                              -- fix the type at the schema level
WHERE first_name = 'Alice' AND last_name = 'Smith'

Three of the four non-SARGable forms have clean rewrites. The first one — LOWER(email) — depends on collation, which is where a lot of real-world cases live.

The collation case

WHERE LOWER(email) = '[email protected]' is almost always a tell that the email column has a case-sensitive collation and the application is hiding it at query time. Two real fixes, one cosmetic fix:

Fix the column. If the data should be matched case-insensitively, give the column a case-insensitive collation. In PostgreSQL that’s CITEXT or a COLLATE "und-x-icu" with the ICU provider; in MySQL it’s a _ci collation (which is usually the default anyway). Once the column’s collation handles the case folding, WHERE email = '[email protected]' is SARGable and fast. This is the right fix when case-insensitivity is a property of the data.

Add a functional (expression) index. If you can’t change the column’s collation — there’s a case-sensitive comparison elsewhere in the schema that depends on the current behavior — index the expression itself:

1
2
3
4
5
6
-- PostgreSQL: functional index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Now WHERE LOWER(email) = '...' uses the index

-- MySQL 8.0+: expression index (requires the same constant-folding fix)
ALTER TABLE users ADD INDEX idx_email_lower ((LOWER(email)));

This works, with caveats. The index’s storage and write cost is real. The predicate has to match the indexed expression exactly — LOWER(email) is indexed, but UPPER(email) isn’t, and the planner won’t translate between them. Every non-SARGable expression you want fast needs its own index.

Cosmetic fix: case-fold at write time. Store the email as already-lowercased. WHERE email = '[email protected]' is now SARGable directly, no expression index needed. This usually requires application changes — whoever’s writing has to remember to case-fold — which is why the functional index is more popular even though it’s heavier. Where business logic lives covers the general shape of this decision; case-folding at the database with a generated column (GENERATED ALWAYS AS (LOWER(email)) STORED) is often the cleanest answer when the application can’t be trusted to normalize consistently.

Implicit type conversions are the subtler version

The function isn’t always in the query. Sometimes the planner is adding one:

1
2
-- account_id is VARCHAR, literal is numeric
WHERE account_id = 12345

MySQL will silently cast every account_id value to a number for comparison — a per-row function call that kills index usage just as effectively as an explicit CAST(). PostgreSQL is stricter and usually errors, but can still do implicit conversions between compatible types that undermine indexes.

The fix is matching types in both directions: the column type should be what the column is (a numeric ID should be BIGINT, not VARCHAR), and the query should write the literal in the column’s type (WHERE account_id = '12345' if the column is genuinely a string). Either fix works; matching the column type to the data’s real shape is usually the durable answer.

This is also where mixed PK strategies show up — joining a BIGINT id to a UUID id doesn’t just return wrong results; on MySQL it coerces one side to a string, which is the same implicit-function problem dressed up as a join.

When non-SARGable is acceptable

Not every non-SARGable predicate is a bug. Three cases where it’s fine:

Small tables. A 5,000-row lookup table with a function-wrapped predicate scans in microseconds. The planner isn’t going to use an index on that size anyway. WHERE UPPER(code) = 'NY' on a 50-row states table is not worth worrying about.

One-off analytical queries. A one-time data extract that scans a large table is going to scan it regardless. If the query will never run again, the function call isn’t the bottleneck — the table size is — and adding a functional index to optimize one query isn’t worth the write cost on every future insert.

When the function genuinely can’t be avoided. Some predicates legitimately need to compute. WHERE haversine_distance(lat, lng, user_lat, user_lng) < 10 on a geospatial query can’t be rewritten as a simple range; you need a spatial index (PostGIS, MySQL spatial extensions) to make it SARGable in the geometric sense. The fix is a different kind of index, not a rewrite.

The bigger picture

Non-SARGable predicates are easy to write, and they come from somewhere — almost always a schema decision that’s being papered over at query time. LOWER(email) hides a collation mismatch. CAST(price AS INT) hides a type that should have been NUMERIC from the start. DATE(created_at) hides the fact that the query is answering a date-range question but written in a way that reads more naturally as an equality. Every one of these is a query-level workaround for a schema-level issue, and every one of them costs an index when the table grows large enough to care.

EXPLAIN ANALYZE is the diagnostic. If the plan shows a sequential scan on a predicate that should hit an index, the predicate is almost certainly non-SARGable — look at what’s wrapping the column. Fix the schema if you can, add a functional index if you can’t, and treat non-SARGable predicates on hot paths as latent performance bugs, not style issues.

SELECT insights FROM experience WHERE downtime = 0; -- Ruslan Tolkachev