<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Query-Performance on EXPLAIN ANALYZE</title><link>https://explainanalyze.com/categories/query-performance/</link><description>Recent content in Query-Performance on EXPLAIN ANALYZE</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Fri, 18 Jul 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://explainanalyze.com/categories/query-performance/index.xml" rel="self" type="application/rss+xml"/><item><title>Covering Index Traps: When Adding One Column Breaks Your Query</title><link>https://explainanalyze.com/p/covering-index-traps-when-adding-one-column-breaks-your-query/</link><pubDate>Fri, 18 Jul 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/covering-index-traps-when-adding-one-column-breaks-your-query/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Covering Index Traps: When Adding One Column Breaks Your Query" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;An index-only scan is the fastest way a relational database can answer a query: the engine reads the index and never touches the table. Adding a single column to the SELECT list that isn&amp;rsquo;t in the index silently breaks the optimization, and the query that ran in a millisecond now takes seconds. The SELECT list is part of the query&amp;rsquo;s performance contract with the index.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Here&amp;rsquo;s a query that ran in production for a year with sub-millisecond latency:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;orders&lt;/code&gt; table has a composite index on &lt;code&gt;(customer_id, status, created_at)&lt;/code&gt;. Every column the query needs (&lt;code&gt;customer_id&lt;/code&gt; for the filter, &lt;code&gt;status&lt;/code&gt; and &lt;code&gt;created_at&lt;/code&gt; for the output) is in that index. The database reads the index, returns the results, and never touches the table. This is an index-only scan: one of the most significant optimizations a relational engine makes, and the mechanism behind &amp;ldquo;covering&amp;rdquo; queries.&lt;/p&gt;
&lt;p&gt;Then a feature request: &amp;ldquo;show the order total on this page.&amp;rdquo; The change looks trivial.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;One column added. The query is still correct. The index still matches the filter. &lt;code&gt;total_cents&lt;/code&gt; isn&amp;rsquo;t in the index, so for every matching row, the engine now follows a pointer back to the table to fetch that one extra column. On a table with millions of rows, that&amp;rsquo;s a random I/O per match. The query that was 0.4 ms is now 1243 ms.&lt;/p&gt;
&lt;p&gt;The obvious fix is &amp;ldquo;just don&amp;rsquo;t add columns to queries.&amp;rdquo; That doesn&amp;rsquo;t work; features need data. The slightly-less-obvious fix is &amp;ldquo;always project the minimum columns,&amp;rdquo; which is fine as advice and ignored in practice because every ORM defaults to &lt;code&gt;SELECT *&lt;/code&gt;. The actual fix is to treat the SELECT list as part of the query&amp;rsquo;s performance contract with the index, and to know what that contract is before changing it.&lt;/p&gt;
&lt;h2 id="whats-actually-happening"&gt;What&amp;rsquo;s actually happening
&lt;/h2&gt;&lt;p&gt;The execution plan tells the whole story:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Before: index-only scan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Index Only Scan using idx_orders_cust_status_created on orders
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Heap Fetches: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Execution Time: 0.4 ms
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- After: index scan + table lookups
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Index Scan using idx_orders_cust_status_created on orders
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Execution Time: 1243.7 ms
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Same index. Same filter. Same rows returned. The only difference is the select list, and it moves the query from a pure index walk to an index walk plus one random I/O per matching row.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Buffer pool pollution compounds the damage.&lt;/strong&gt; When the engine fetches full rows from the table instead of reading compact index entries, it loads entire data pages into the buffer pool. Those pages (carrying every column of every matched row, most of which the query doesn&amp;rsquo;t need) evict pages that other queries do need. On a busy system with a finite buffer pool, one query losing its covering index degrades performance for unrelated queries across the database. The slow query you noticed is rarely the only thing getting slower.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Nothing in the query results tells you.&lt;/strong&gt; The rows come back correctly. The response looks the same. A &lt;code&gt;SELECT COUNT(*)&lt;/code&gt; returns the same count. The only place the degradation is visible is in the execution plan, and nobody checks the execution plan when the feature ships.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;ORM defaults&lt;/strong&gt;
 &lt;div&gt;Most ORMs emit &lt;code&gt;SELECT *&lt;/code&gt; unless explicitly told otherwise. ActiveRecord needs &lt;code&gt;.select(:id, :status)&lt;/code&gt;; Django needs &lt;code&gt;.only('id', 'status')&lt;/code&gt;; SQLAlchemy needs explicit column specification; Prisma needs an explicit &lt;code&gt;select&lt;/code&gt; block. On a high-traffic table, a one-line change to project only the needed columns is one of the highest-leverage optimizations available. Worth checking what your ORM actually generates on the query paths that matter; the generated SQL is the contract, not the method call.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="the-fix-match-the-index-or-extend-it"&gt;The fix: match the index or extend it
&lt;/h2&gt;&lt;p&gt;There are two workable fixes when a query loses its covering index, and they trade different costs:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Project only what the index covers.&lt;/strong&gt; If the new column isn&amp;rsquo;t worth fetching from the table on every row, don&amp;rsquo;t fetch it. Split the query: one covered query for the list view, a targeted lookup for the detail row the user actually wants. Most feature requests that &amp;ldquo;need&amp;rdquo; an extra column on a list page are actually fine with lazy-loading the value on click.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extend the index to include the new column.&lt;/strong&gt; If the column is genuinely needed on every row, add it to the index, either as an additional indexed column or (in PostgreSQL) as an &lt;code&gt;INCLUDE&lt;/code&gt; clause that adds the value to the leaf pages without making it part of the B-tree ordering:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: add total_cents as a non-key included column
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idx_orders_cust_status_created_total&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INCLUDE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;INCLUDE&lt;/code&gt; is the right tool when you need the column covered but don&amp;rsquo;t want it affecting the sort order or filter path. The trade-off is write cost: the index is now larger, and every update to &lt;code&gt;total_cents&lt;/code&gt; has to update the index entry. On a write-heavy table that&amp;rsquo;s meaningful; on a read-heavy table it&amp;rsquo;s usually negligible compared to the read speedup.&lt;/p&gt;
&lt;p&gt;MySQL (InnoDB) doesn&amp;rsquo;t support &lt;code&gt;INCLUDE&lt;/code&gt; but has a natural equivalent: every secondary index already contains the primary key at its leaves, and you can extend the secondary index to cover additional columns by adding them as regular key columns. The planner is smart enough to use the covered form when the column is present.&lt;/p&gt;
&lt;h2 id="when-covering-isnt-the-right-call"&gt;When covering isn&amp;rsquo;t the right call
&lt;/h2&gt;&lt;p&gt;Covering indexes aren&amp;rsquo;t a universal good. Three cases where chasing a covering index is the wrong move:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Low-selectivity filters.&lt;/strong&gt; If &lt;code&gt;customer_id = 42&lt;/code&gt; matches 80% of the table, the planner won&amp;rsquo;t use the index at all; a sequential scan is cheaper. Index-only scans matter when the filter is selective. On a low-selectivity predicate, covering changes nothing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Write-heavy tables.&lt;/strong&gt; Every index slows writes. A table taking 50,000 inserts per second with five secondary indexes already pays a real cost for every index entry. Adding a covering variant of an existing index to shave read latency from 15 ms to 3 ms is a bad trade if the table is write-dominated; the write penalty compounds on every row, and only the reads benefit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rapidly changing projections.&lt;/strong&gt; If the feature team is adding and removing columns from the list view every sprint, chasing the covering index is a losing game. Freeze the list-view columns as a contract, document them in the schema, and let the index match that contract, or don&amp;rsquo;t bother indexing for coverage at all.&lt;/p&gt;
&lt;h2 id="one-more-column-silently-uncovered"&gt;One more column, silently uncovered
&lt;/h2&gt;&lt;p&gt;The archetypal AI-generated version of this bug is a one-line change that adds a column to the SELECT list. A feature request says &amp;ldquo;show the order total on the list page&amp;rdquo;; the assistant reads the existing query, adds &lt;code&gt;total_cents&lt;/code&gt; to the projection, and returns the patch. The query still runs, the list page still renders, and the p99 quietly moves from 0.4 ms to 1200 ms because the index-only scan became a heap-fetch scan and nobody noticed until the dashboard did.&lt;/p&gt;
&lt;p&gt;Coverage checking itself isn&amp;rsquo;t hard reasoning. Given a query and an index definition, working out whether the SELECT list stays inside the index is a short syntactic check any capable model can do. The catalog exposes the ingredients: PostgreSQL&amp;rsquo;s &lt;code&gt;pg_index&lt;/code&gt; separates key columns from &lt;code&gt;INCLUDE&lt;/code&gt; ones via &lt;code&gt;indnkeyatts&lt;/code&gt; vs &lt;code&gt;indnatts&lt;/code&gt;, MySQL&amp;rsquo;s &lt;code&gt;information_schema.STATISTICS&lt;/code&gt; lists all columns per index. The signal is there. What fails in practice is subtler. The relevant index often isn&amp;rsquo;t in the prompt&amp;rsquo;s context window; schema-aware tools pull catalog metadata, but whether &lt;code&gt;idx_orders_cust_status_created&lt;/code&gt; lands in the retrieved context for &amp;ldquo;add total_cents to the list view&amp;rdquo; depends on retrieval heuristics, not the model&amp;rsquo;s capability. Even when the index definition is available, the default behavior for &amp;ldquo;modify this query&amp;rdquo; is to modify the query; re-verifying that the projection stays covered is a step the assistant rarely takes unsolicited. And only the planner&amp;rsquo;s actual choice is authoritative; static analysis gets most of the way, but nothing short of &lt;code&gt;EXPLAIN&lt;/code&gt; tells you which index the query will use under real statistics.&lt;/p&gt;
&lt;p&gt;The fix at the schema level is what makes the coverage relationship legible to the next reader, human or model: name indexes after the query they support (&lt;code&gt;idx_orders_list_view&lt;/code&gt; tells you what depends on it), document &lt;code&gt;INCLUDE&lt;/code&gt; columns in the index comment, and put a comment on the query itself pointing at the index. None of this is novel advice. It becomes load-bearing once an assistant is routinely modifying queries: the explicit link between query and covering index is the signal that tells the assistant (and the human reviewer) &amp;ldquo;this change has an index implication&amp;rdquo; rather than silently shipping the uncovered patch.&lt;/p&gt;
&lt;h2 id="the-bigger-picture"&gt;The bigger picture
&lt;/h2&gt;&lt;p&gt;The SELECT list is a performance contract in most code reviewers&amp;rsquo; blind spot. WHERE clauses get scrutinized because they&amp;rsquo;re obviously performance-relevant. JOINs get scrutinized because cardinality mistakes are visible. The SELECT list gets waved through because &amp;ldquo;it&amp;rsquo;s just what we display&amp;rdquo;, and then a one-column addition drops a query from 0.4 ms to 1243 ms with no code-review signal to catch it.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; is the only authority here. Reading execution plans isn&amp;rsquo;t glamorous, but it&amp;rsquo;s the difference between a query that works and a query that works at scale, and between a select-list change that&amp;rsquo;s free and one that silently broke the optimization the index existed to enable. On the queries that carry the most traffic, the execution plan belongs in code review alongside the query itself.&lt;/p&gt;</description></item><item><title>Non-SARGable Predicates: How a Function in WHERE Kills Your Index</title><link>https://explainanalyze.com/p/non-sargable-predicates-how-a-function-in-where-kills-your-index/</link><pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/non-sargable-predicates-how-a-function-in-where-kills-your-index/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Non-SARGable Predicates: How a Function in WHERE Kills Your Index" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;A predicate is SARGable (Search ARGument able) if the database can use an index to evaluate it. Wrapping a column in a function makes the predicate non-SARGable: the engine has to compute the function on every row before it can filter, which means a full table scan no matter what indexes exist. The fix isn&amp;rsquo;t always to rewrite the predicate (sometimes the column&amp;rsquo;s type or collation is wrong and the code is masking it) but every non-SARGable predicate on a hot path is a performance bug waiting for the table to grow.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Here are two queries that return the exact same rows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Version A
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Version B
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2025-01-01&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2026-01-01&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;On a 10,000-row &lt;code&gt;events&lt;/code&gt; table, both run in under a millisecond and nobody notices the difference. On a 200-million-row &lt;code&gt;events&lt;/code&gt; table with an index on &lt;code&gt;created_at&lt;/code&gt;, version A does a sequential scan and takes 45 seconds; version B does an index range scan and takes 12 milliseconds. Neither query is wrong. They don&amp;rsquo;t even disagree about the answer. One just does the same work in a way the planner can&amp;rsquo;t optimize.&lt;/p&gt;
&lt;p&gt;The obvious fix is &amp;ldquo;rewrite every function-wrapped predicate as a range.&amp;rdquo; That works for the date-extraction case and a few others. For &lt;code&gt;WHERE LOWER(email) = 'alice@example.com'&lt;/code&gt;, the rewrite needs to know whether the column&amp;rsquo;s collation is case-insensitive, and if it isn&amp;rsquo;t, there&amp;rsquo;s no direct equivalent, only a functional index or a schema change. The fix depends on why the function is there, and &amp;ldquo;why&amp;rdquo; usually points back at something in the schema that&amp;rsquo;s pretending to be something it isn&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id="what-sargable-means-in-practice"&gt;What SARGable means in practice
&lt;/h2&gt;&lt;p&gt;An index on &lt;code&gt;created_at&lt;/code&gt; is a sorted structure: the engine can jump to any date range in O(log n) time by walking the B-tree. For the planner to use that index on a predicate, the predicate has to be expressible as &amp;ldquo;the column is in this range&amp;rdquo;: a direct comparison between the column and a constant or parameter.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;created_at &amp;gt;= '2025-01-01'&lt;/code&gt; meets that contract. The planner translates it to &amp;ldquo;walk the index to the first entry ≥ 2025-01-01, read forward from there.&amp;rdquo; That&amp;rsquo;s a range scan.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;YEAR(created_at) = 2025&lt;/code&gt; doesn&amp;rsquo;t meet the contract. The value being compared isn&amp;rsquo;t &lt;code&gt;created_at&lt;/code&gt;; it&amp;rsquo;s the output of &lt;code&gt;YEAR()&lt;/code&gt; applied to &lt;code&gt;created_at&lt;/code&gt;. The index on &lt;code&gt;created_at&lt;/code&gt; doesn&amp;rsquo;t know the output of &lt;code&gt;YEAR()&lt;/code&gt; for any row without computing it. So the planner falls back to evaluating the function on every row (a sequential scan) and only then filtering.&lt;/p&gt;
&lt;p&gt;Common forms of the same mistake:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Non-SARGable: function on column → full scan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alice@example.com&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2025-01-15&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Alice Smith&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- SARGable equivalents
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alice@example.com&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- if collation is case-insensitive
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2025-01-15&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2025-01-16&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- fix the type at the schema level
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Alice&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Smith&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Three of the four non-SARGable forms have clean rewrites. The first one (&lt;code&gt;LOWER(email)&lt;/code&gt;) depends on collation, which is where a lot of real-world cases live.&lt;/p&gt;
&lt;h2 id="the-collation-case"&gt;The collation case
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;WHERE LOWER(email) = 'alice@example.com'&lt;/code&gt; is almost always a tell that the &lt;code&gt;email&lt;/code&gt; column has a case-sensitive collation and the application is hiding it at query time. Two real fixes, one cosmetic fix:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix the column.&lt;/strong&gt; If the data should be matched case-insensitively, give the column a case-insensitive collation. In PostgreSQL that&amp;rsquo;s &lt;code&gt;CITEXT&lt;/code&gt; or a &lt;code&gt;COLLATE &amp;quot;und-x-icu&amp;quot;&lt;/code&gt; with the ICU provider; in MySQL it&amp;rsquo;s a &lt;code&gt;_ci&lt;/code&gt; collation (which is usually the default anyway). Once the column&amp;rsquo;s collation handles the case folding, &lt;code&gt;WHERE email = 'alice@example.com'&lt;/code&gt; is SARGable and fast. This is the right fix when case-insensitivity is a property of the data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Add a functional (expression) index.&lt;/strong&gt; If you can&amp;rsquo;t change the column&amp;rsquo;s collation (there&amp;rsquo;s a case-sensitive comparison elsewhere in the schema that depends on the current behavior) index the expression itself:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: functional index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idx_users_email_lower&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Now WHERE LOWER(email) = &amp;#39;...&amp;#39; uses the index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL 8.0+: expression index (requires the same constant-folding fix)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idx_email_lower&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This works, with caveats. The index&amp;rsquo;s storage and write cost is real. The predicate has to match the indexed expression exactly: &lt;code&gt;LOWER(email)&lt;/code&gt; is indexed, but &lt;code&gt;UPPER(email)&lt;/code&gt; isn&amp;rsquo;t, and the planner won&amp;rsquo;t translate between them. Every non-SARGable expression you want fast needs its own index.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cosmetic fix: case-fold at write time.&lt;/strong&gt; Store the email as already-lowercased. &lt;code&gt;WHERE email = 'alice@example.com'&lt;/code&gt; is now SARGable directly, no expression index needed. This usually requires application changes (whoever&amp;rsquo;s writing has to remember to case-fold) which is why the functional index is more popular even though it&amp;rsquo;s heavier. &lt;a class="link" href="https://explainanalyze.com/p/where-business-logic-lives-database-vs.-application/" &gt;Where business logic lives&lt;/a&gt; covers the general shape of this decision; case-folding at the database with a generated column (&lt;code&gt;GENERATED ALWAYS AS (LOWER(email)) STORED&lt;/code&gt;) is often the cleanest answer when the application can&amp;rsquo;t be trusted to normalize consistently.&lt;/p&gt;
&lt;h2 id="implicit-type-conversions-are-the-subtler-version"&gt;Implicit type conversions are the subtler version
&lt;/h2&gt;&lt;p&gt;The function isn&amp;rsquo;t always in the query. Sometimes the planner is adding one:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- account_id is VARCHAR, literal is numeric
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12345&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;MySQL will silently cast every &lt;code&gt;account_id&lt;/code&gt; value to a number for comparison: a per-row function call that kills index usage just as effectively as an explicit &lt;code&gt;CAST()&lt;/code&gt;. PostgreSQL is stricter and usually errors, but can still do implicit conversions between compatible types that undermine indexes.&lt;/p&gt;
&lt;p&gt;The fix is matching types in both directions: the column type should be what the column is (a numeric ID should be &lt;code&gt;BIGINT&lt;/code&gt;, not &lt;code&gt;VARCHAR&lt;/code&gt;), and the query should write the literal in the column&amp;rsquo;s type (&lt;code&gt;WHERE account_id = '12345'&lt;/code&gt; if the column is genuinely a string). Either fix works; matching the column type to the data&amp;rsquo;s real shape is usually the durable answer.&lt;/p&gt;
&lt;p&gt;This is also where &lt;a class="link" href="https://explainanalyze.com/p/the-bare-id-primary-key-when-every-table-joins-to-every-other-table/" &gt;mixed PK strategies&lt;/a&gt; show up. Joining a BIGINT &lt;code&gt;id&lt;/code&gt; to a UUID &lt;code&gt;id&lt;/code&gt; doesn&amp;rsquo;t just return wrong results; on MySQL it coerces one side to a string, which is the same implicit-function problem dressed up as a join.&lt;/p&gt;
&lt;h2 id="when-non-sargable-is-acceptable"&gt;When non-SARGable is acceptable
&lt;/h2&gt;&lt;p&gt;Not every non-SARGable predicate is a bug. Three cases where it&amp;rsquo;s fine:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Small tables.&lt;/strong&gt; A 5,000-row lookup table with a function-wrapped predicate scans in microseconds. The planner isn&amp;rsquo;t going to use an index on that size anyway. &lt;code&gt;WHERE UPPER(code) = 'NY'&lt;/code&gt; on a 50-row &lt;code&gt;states&lt;/code&gt; table is not worth worrying about.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;One-off analytical queries.&lt;/strong&gt; A one-time data extract that scans a large table is going to scan it regardless. If the query will never run again, the function call isn&amp;rsquo;t the bottleneck (the table size is) and adding a functional index to optimize one query isn&amp;rsquo;t worth the write cost on every future insert.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When the function genuinely can&amp;rsquo;t be avoided.&lt;/strong&gt; Some predicates legitimately need to compute. &lt;code&gt;WHERE haversine_distance(lat, lng, user_lat, user_lng) &amp;lt; 10&lt;/code&gt; on a geospatial query can&amp;rsquo;t be rewritten as a simple range; you need a spatial index (PostGIS, MySQL spatial extensions) to make it SARGable in the geometric sense. The fix is a different kind of index, not a rewrite.&lt;/p&gt;
&lt;h2 id="why-natural-language-to-sql-tilts-non-sargable"&gt;Why natural-language-to-SQL tilts non-SARGable
&lt;/h2&gt;&lt;p&gt;Schema-reading assistants and text-to-SQL models produce this class of bug more often than hand-written queries do. A user asks &amp;ldquo;events in 2025&amp;rdquo;; the closest English-to-SQL mapping is &lt;code&gt;WHERE YEAR(created_at) = 2025&lt;/code&gt;, and that&amp;rsquo;s what the model writes. The correct form (a half-open range) requires knowing the calendar boundary of the year and producing two comparison operators, which is a less-direct translation of the question. &lt;code&gt;WHERE LOWER(email) = 'alice@example.com'&lt;/code&gt; is the natural translation of &amp;ldquo;find the user with this email, case-insensitive,&amp;rdquo; even when the column&amp;rsquo;s collation already handles case and the function wrap defeats the index it would otherwise use.&lt;/p&gt;
&lt;p&gt;The catalog-level fix is the same one the bigger-picture section below points at: model the column so the natural query is already SARGable. Pick a case-insensitive collation on &lt;code&gt;email&lt;/code&gt;, store prices as &lt;code&gt;NUMERIC&lt;/code&gt; so no cast is needed, partition or index date columns so the range-literal form performs. When the schema matches the shape of the question, the model&amp;rsquo;s default translation works. When it doesn&amp;rsquo;t, the model produces a query that runs clean and scans the table, and no plan inspection is built into the generation loop to catch it.&lt;/p&gt;
&lt;h2 id="the-bigger-picture"&gt;The bigger picture
&lt;/h2&gt;&lt;p&gt;Non-SARGable predicates are easy to write, and they come from somewhere: almost always a schema decision that&amp;rsquo;s being papered over at query time. &lt;code&gt;LOWER(email)&lt;/code&gt; hides a collation mismatch. &lt;code&gt;CAST(price AS INT)&lt;/code&gt; hides a type that should have been &lt;code&gt;NUMERIC&lt;/code&gt; from the start. &lt;code&gt;DATE(created_at)&lt;/code&gt; hides the fact that the query is answering a date-range question but written in a way that reads more naturally as an equality. Every one of these is a query-level workaround for a schema-level issue, and every one of them costs an index when the table grows large enough to care.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; is the diagnostic. If the plan shows a sequential scan on a predicate that should hit an index, the predicate is almost certainly non-SARGable; look at what&amp;rsquo;s wrapping the column. Fix the schema if you can, add a functional index if you can&amp;rsquo;t, and treat non-SARGable predicates on hot paths as latent performance bugs, not style issues.&lt;/p&gt;</description></item><item><title>Database Deadlocks, Part 2: Diagnosis, Retries, and Prevention</title><link>https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/</link><pubDate>Sun, 02 Mar 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Database Deadlocks, Part 2: Diagnosis, Retries, and Prevention" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;&lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/" target="_blank" rel="noopener"
 &gt;Part 1&lt;/a&gt; covered the patterns. This post is the operational half: reading the deadlock log to identify which pattern fired, designing retries that fail loudly instead of hiding the real bug, isolating hot rows before they become incidents, and the prevention primitives (&lt;code&gt;NOWAIT&lt;/code&gt;, &lt;code&gt;SKIP LOCKED&lt;/code&gt;, isolation-level changes, &lt;code&gt;lock_timeout&lt;/code&gt;) that remove entire categories from the workload.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The patterns in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/" &gt;Part 1&lt;/a&gt; answer &amp;ldquo;why did this deadlock happen?&amp;rdquo;. This post answers the next three questions every team ends up asking: &amp;ldquo;which pattern fired?&amp;rdquo;, &amp;ldquo;what should the application do when it does?&amp;rdquo;, and &amp;ldquo;how do we stop the ones that hurt most?&amp;rdquo;. Each has its own tooling, failure modes, and production idioms, and almost all of them are learned after the first incident, not before.&lt;/p&gt;
&lt;p&gt;The trap worth naming up front: tuning retry logic is easy, reading a deadlock log is harder, and the second is what tells you whether the first matters. A team that only tunes retries is optimizing the symptom without ever seeing the cause. This post is structured in the opposite order: log-reading first, then retries, then the prevention patterns that remove entire categories from the workload.&lt;/p&gt;
&lt;h2 id="reading-the-mysql-deadlock-log"&gt;Reading the MySQL deadlock log
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt; dumps the most recent deadlock in the &lt;code&gt;LATEST DETECTED DEADLOCK&lt;/code&gt; section. The catch: only the most recent. On a busy system, deadlocks overwrite each other faster than someone can log in and copy the output. Before anything else, turn on &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_print_all_deadlocks" target="_blank" rel="noopener"
 &gt;&lt;code&gt;innodb_print_all_deadlocks&lt;/code&gt;&lt;/a&gt;&lt;code&gt; = ON&lt;/code&gt; in every production deployment. It writes every deadlock to the error log instead of a single overwriting slot. The volume is negligible, the diagnostic value is high, and there is no downside.&lt;/p&gt;
&lt;p&gt;A representative entry looks like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;span class="lnt"&gt;18
&lt;/span&gt;&lt;span class="lnt"&gt;19
&lt;/span&gt;&lt;span class="lnt"&gt;20
&lt;/span&gt;&lt;span class="lnt"&gt;21
&lt;/span&gt;&lt;span class="lnt"&gt;22
&lt;/span&gt;&lt;span class="lnt"&gt;23
&lt;/span&gt;&lt;span class="lnt"&gt;24
&lt;/span&gt;&lt;span class="lnt"&gt;25
&lt;/span&gt;&lt;span class="lnt"&gt;26
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-mysql" data-lang="mysql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4823941&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ACTIVE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;starting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;read&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kp"&gt;tables&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WAIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;lock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;heap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1136&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;MySQL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;892&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x7f&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18293&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;shipped&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WAITING&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FOR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;THIS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;BE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GRANTED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;RECORD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOCKS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;112&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;144&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4823941&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lock_mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;locks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;but&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gap&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4823942&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ACTIVE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;starting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;read&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;paid&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1002&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HOLDS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;THE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;RECORD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOCKS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;112&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;144&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4823942&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lock_mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;locks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;but&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gap&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WAITING&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FOR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;THIS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;BE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GRANTED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;RECORD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOCKS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;112&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;144&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4823942&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lock_mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;locks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;but&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gap&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;***&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ROLL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;BACK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;TRANSACTION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The parts that matter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;lock_mode&lt;/code&gt; vs. &lt;code&gt;lock_type&lt;/code&gt;.&lt;/strong&gt; &lt;code&gt;X&lt;/code&gt; is exclusive, &lt;code&gt;S&lt;/code&gt; is shared. &lt;code&gt;locks rec but not gap&lt;/code&gt; is a pure record lock; &lt;code&gt;locks gap before rec&lt;/code&gt; is a gap lock; the unadorned &lt;code&gt;X&lt;/code&gt; under &lt;code&gt;REPEATABLE READ&lt;/code&gt; is usually next-key (record + gap). Matching &lt;code&gt;lock_mode S locks rec but not gap&lt;/code&gt; against &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#unique-index-deadlocks-are-a-category-of-their-own" &gt;Part 1&amp;rsquo;s unique-index section&lt;/a&gt; tells you immediately that this is a duplicate-key-on-insert deadlock.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;index&lt;/code&gt; name.&lt;/strong&gt; &lt;code&gt;index PRIMARY&lt;/code&gt; vs. &lt;code&gt;index idx_customer&lt;/code&gt; reveals whether the cycle formed on the clustered index or a secondary one. Two transactions approaching the same rows from different indexes is the &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#index-scans-lock-more-rows-than-queries-return" &gt;&amp;ldquo;secondary index locks on InnoDB&amp;rdquo; pattern from Part 1&lt;/a&gt;; the fix is usually consolidating access paths.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The query text.&lt;/strong&gt; This is the last statement the transaction executed before the deadlock, not necessarily the one that caused it. A transaction holding locks from three earlier statements can deadlock on the fourth, and the log only shows the fourth. Cross-reference with the application&amp;rsquo;s structured logs to reconstruct the full transaction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;trx id&lt;/code&gt;&lt;/strong&gt; is monotonically increasing and stable for the life of the transaction. Searching the general log or slow-query log for that &lt;code&gt;trx id&lt;/code&gt; reconstructs the full statement sequence, but only if general-query logging is on for the window in question, which it usually isn&amp;rsquo;t.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;performance_schema.data_locks&lt;/code&gt; and &lt;code&gt;data_lock_waits&lt;/code&gt; give a real-time view of current locks and waits. Useful for catching a deadlock-adjacent pathology (long wait chains, hot rows) before the cycle forms:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;REQUESTING_ENGINE_TRANSACTION_ID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;waiting_trx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BLOCKING_ENGINE_TRANSACTION_ID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocking_trx&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;performance_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_lock_waits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;performance_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_locks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BLOCKING_ENGINE_LOCK_ID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ENGINE_LOCK_ID&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT_SCHEMA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;shop&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id="reading-the-postgresql-deadlock-log"&gt;Reading the PostgreSQL deadlock log
&lt;/h2&gt;&lt;p&gt;PostgreSQL&amp;rsquo;s diagnostic story is narrower by design. Deadlocks are logged automatically when the cycle is detected. &lt;code&gt;log_lock_waits = on&lt;/code&gt; logs any wait exceeding &lt;code&gt;deadlock_timeout&lt;/code&gt; (default 1s), which catches the wait-chain escalation before the detector fires. There&amp;rsquo;s no equivalent to &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;; everything lives in &lt;code&gt;postgresql.log&lt;/code&gt; or the extensions you&amp;rsquo;ve installed.&lt;/p&gt;
&lt;p&gt;A representative deadlock entry:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-zed" data-lang="zed"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;ERROR&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deadlock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;detected&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;DETAIL&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14234&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;waits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ShareLock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;89234&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14235&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14235&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;waits&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ShareLock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;89233&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14234&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14234&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;14235&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;HINT&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;See&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;CONTEXT&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;updating&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tuple&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;18&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;relation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;#34;&lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="err"&gt;&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;ShareLock on transaction X&lt;/code&gt; wording is PostgreSQL-specific. One transaction is waiting to see the commit status of another (a row-lock wait manifests as waiting on the holder&amp;rsquo;s transaction ID). The tuple identifier &lt;code&gt;(0,18)&lt;/code&gt; points to the exact physical row (page 0, tuple 18 in the heap), which is useful for reproducing the scenario but changes as rows are updated (MVCC creates new versions at new &lt;code&gt;(page, tuple)&lt;/code&gt; locations).&lt;/p&gt;
&lt;p&gt;For real-time inspection, &lt;code&gt;pg_locks&lt;/code&gt; joined against &lt;code&gt;pg_stat_activity&lt;/code&gt; shows live lock state:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locktype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_blocking_pids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocked_by&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_locks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;granted&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xact_start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;pg_blocking_pids(pid)&lt;/code&gt; returns the array of PIDs blocking a given transaction. Walking it recursively reconstructs the live wait-for graph: the same data the deadlock detector uses, just before a cycle forms. For hot systems, &lt;code&gt;pg_stat_statements&lt;/code&gt; combined with &lt;code&gt;pg_stat_activity&lt;/code&gt; snapshots at regular intervals builds a picture of which statements accumulate the most wait time, which is almost always the right first place to look.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;Row locks are invisible in pg_locks&lt;/strong&gt;
 &lt;div&gt;&lt;p&gt;PostgreSQL&amp;rsquo;s row-level locks (the result of &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt;, FK checks, and plain &lt;code&gt;UPDATE&lt;/code&gt;/&lt;code&gt;DELETE&lt;/code&gt;) are stored on the tuple itself, in the &lt;code&gt;xmax&lt;/code&gt; system column, not in the shared lock table. They don&amp;rsquo;t show up in &lt;code&gt;pg_locks&lt;/code&gt;. The only way to see them is through the &lt;code&gt;pgrowlocks&lt;/code&gt; extension, which scans the heap directly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXTENSION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pgrowlocks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pgrowlocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;accounts&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is the single biggest difference between PG and InnoDB lock introspection, and the reason PG operators often feel blind to row-level contention until a cycle actually forms.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="postgresql-serializable-serialization-failures-are-not-deadlocks"&gt;PostgreSQL SERIALIZABLE: serialization failures are not deadlocks
&lt;/h2&gt;&lt;p&gt;Under &lt;code&gt;SERIALIZABLE&lt;/code&gt; isolation, PostgreSQL uses Serializable Snapshot Isolation (SSI), an optimistic mechanism based on SIREAD predicate locks that track read-write dependencies between transactions. SSI cannot deadlock by design; it never blocks on lock acquisition. What it does is abort one transaction with a serialization failure when it detects a dangerous read-write dependency cycle that would violate serializability.&lt;/p&gt;
&lt;p&gt;The two failure codes look similar but have fundamentally different semantics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;40001 serialization_failure&lt;/code&gt;.&lt;/strong&gt; SSI detected a dependency cycle and aborted the transaction before it could commit a non-serializable result. The transaction did nothing wrong; the combination of its operations with a concurrent transaction would have produced an inconsistency. Retrying is always safe and usually succeeds (the concurrent transaction will have committed or aborted, so the second attempt doesn&amp;rsquo;t see the same conflict).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;40P01 deadlock_detected&lt;/code&gt;.&lt;/strong&gt; A cycle in the wait-for graph was broken by killing a victim. Retrying may or may not succeed depending on what caused the cycle. If the cycle was deterministic (two code paths with inconsistent ordering), it will keep recurring.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The practical consequence for retry architecture: an application running under &lt;code&gt;SERIALIZABLE&lt;/code&gt; must handle 40001. It&amp;rsquo;s not a deadlock, it&amp;rsquo;s the normal failure mode of SSI, and retries are the only recovery path. An application running under &lt;code&gt;READ COMMITTED&lt;/code&gt; never sees 40001. An application that handles 40001 identically to 40P01 is correct but coarse. The right granularity is: always retry 40001 (the workload&amp;rsquo;s own correctness guarantee assumes this); retry 40P01 with caution and escalate on repeat.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retry_on_conflict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;psycopg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SerializationFailure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# 40001: always retry. SSI guarantees make this safe.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;psycopg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DeadlockDetected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# 40P01: retry with caution; log for root-cause analysis.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;log_deadlock_for_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;TransactionRetryExhausted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id="retry-architecture-that-doesnt-hide-the-bug"&gt;Retry architecture that doesn&amp;rsquo;t hide the bug
&lt;/h2&gt;&lt;p&gt;Every database driver documentation says &amp;ldquo;deadlocks happen, retry the transaction.&amp;rdquo; That&amp;rsquo;s true. It&amp;rsquo;s also incomplete. The dangers are subtle enough that a naive retry loop becomes part of the problem:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retries without backoff make the cycle worse.&lt;/strong&gt; The condition that caused the deadlock (contention on a hot key set) is still in effect when the retry runs. A tight retry loop turns one deadlock into a thundering herd: all victims retry simultaneously, all hit the same contention, all deadlock again. Use exponential backoff with full jitter, capped at a few hundred milliseconds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retries mask lock-ordering bugs.&lt;/strong&gt; If an application deadlocks 10x/minute but retries successfully, the operator sees no failures, but the underlying transactions are doing up to 20x the work (original + retry). The deadlock rate itself is a metric worth tracking, not just the post-retry error rate. When the rate grows, the fix is diagnosing the pattern, not tuning the retry limit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retries aren&amp;rsquo;t always safe.&lt;/strong&gt; A transaction that sent an external notification, wrote to a message queue, or called a non-idempotent HTTP endpoint before the deadlock can&amp;rsquo;t be blindly retried; the external side effect already happened. Retries belong on database-only transactions, or on transactions where the external calls are idempotent and tolerant of duplicate execution. The boundary is architectural, not a library setting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retries need a budget.&lt;/strong&gt; If a transaction can&amp;rsquo;t complete after ~3 attempts, the problem is no longer transient contention. It&amp;rsquo;s either a systemic hot spot or a bug that retries will never resolve. Escalate (alert, fail the request, enqueue for manual review), don&amp;rsquo;t loop forever.&lt;/p&gt;
&lt;p&gt;The retry pattern that works in production:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;do_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DeadlockError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SerializationFailure&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;db.retry&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;error&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;RetryBudgetExhausted&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Three attempts, exponential backoff up to ~400ms, metrics emitted on every retry, hard failure past the budget. The metric matters as much as the retry - without it, the team never learns which transactions are retrying frequently and why.&lt;/p&gt;
&lt;div class="warning-box"&gt;
 &lt;strong&gt;Idempotency is a transaction-shape property&lt;/strong&gt;
 &lt;div&gt;A transaction is safe to retry iff re-running it produces the same observable state. That includes downstream side effects. A transaction that writes to a table AND sends a webhook is not safe to retry even if both operations are internally correct: the second attempt sends a duplicate webhook. The fix is the outbox pattern: write the webhook-send intent to a table in the same transaction, then have a separate worker process the outbox with its own idempotency guarantees. This is a non-negotiable part of building deadlock-retry-safe systems at scale.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="what-schema-reading-assistants-write-and-what-they-skip"&gt;What schema-reading assistants write, and what they skip
&lt;/h2&gt;&lt;p&gt;AI-generated database code rarely ships with the retry scaffold above, and the reason is structural: nothing in the query text advertises that it can deadlock. An assistant asked to &amp;ldquo;bulk-upsert inventory&amp;rdquo; produces the &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; and stops. The query looks atomic from the outside, and the catalog doesn&amp;rsquo;t expose the shared next-key locks, gap-lock retention on FK checks, or unique-index lock escalation that turn it into a deadlock candidate under load. Retry logic, backoff, and idempotency boundaries are all decisions that live outside the query, in places the schema-reading prompt typically doesn&amp;rsquo;t reach.&lt;/p&gt;
&lt;p&gt;The architectural choices are worse. Outbox-pattern correctness (write to an outbox table in the same transaction, process it separately) depends on understanding which external calls must happen exactly once, which is domain knowledge no catalog carries. A model asked to &amp;ldquo;send a notification after the update&amp;rdquo; reliably produces the direct call: clean, passing tests, and unsafe to retry. Treat AI-generated DB code as the ORM layer and nothing more. The queries are often fine in isolation; the retry loop, the idempotency boundary, the outbox separation, and the lock-ordering discipline are the pieces a human review still has to add. The article above is the review checklist.&lt;/p&gt;
&lt;h2 id="consistent-lock-ordering-the-highest-leverage-fix"&gt;Consistent lock ordering: the highest-leverage fix
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#the-canonical-lock-ordering-deadlock" &gt;Part 1&amp;rsquo;s lock-ordering deadlock&lt;/a&gt; (two transactions updating the same set of rows in opposite orders) is the single most common production pattern and the one with the highest-leverage fix. If every code path that writes to a set of tables acquires locks in the same order, the wait-for graph literally cannot form a cycle on those rows. The engine still takes the locks, still holds them for the duration of the transaction, but the second transaction waits cleanly for the first instead of grabbing a lock the first will need.&lt;/p&gt;
&lt;p&gt;The rule is: sort the rows by a stable key (usually the primary key) in application code, before any SQL is issued. Lock acquisition order in both engines is determined by the order the engine processes rows, which for most write patterns is the order the application submitted them. Sort once up front, and N workers all doing the same thing can&amp;rsquo;t cycle because they all approach the row set from the same end.&lt;/p&gt;
&lt;p&gt;The three batch shapes that matter in practice, and where the ordering actually happens:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Loop of per-row UPDATEs.&lt;/strong&gt; The classic batch worker:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Sort in the application; the iteration order IS the lock acquisition order.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;UPDATE accounts SET balance = balance + &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt; WHERE id = &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Each UPDATE locks its target row at execution time; the loop order determines acquisition order. No SQL-level &lt;code&gt;ORDER BY&lt;/code&gt; involved; the fix lives in the &lt;code&gt;.sort()&lt;/code&gt; call.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Bulk UPSERT.&lt;/strong&gt; &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; (MySQL) or &lt;code&gt;INSERT ... ON CONFLICT&lt;/code&gt; (PostgreSQL):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Sort rows by the unique key BEFORE building the batch.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;execute_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;INSERT INTO accounts (id, balance) VALUES &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt; &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;ON CONFLICT (id) DO UPDATE SET balance = accounts.balance + EXCLUDED.balance&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The engine processes the VALUES list in order and acquires locks as it goes. Two concurrent batches sorted by the same key approach the key space from the same end; without the sort, one batch might process &lt;code&gt;(1, 2, 3)&lt;/code&gt; while another processes &lt;code&gt;(3, 2, 1)&lt;/code&gt; and they deadlock on the middle row. This is the exact shape of the bulk-UPSERT deadlocks called out in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#unique-index-deadlocks-are-a-category-of-their-own" &gt;Part 1&amp;rsquo;s unique-index section&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Small multi-row transactions.&lt;/strong&gt; The canonical bank transfer:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Always update the lower id first, regardless of transfer direction.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For 2–3 rows with per-row different values, the application computes &lt;code&gt;sorted((X, Y))&lt;/code&gt; and issues UPDATEs in that order. Same principle as the batch case, just smaller.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Where &lt;code&gt;SELECT ... FOR UPDATE ORDER BY&lt;/code&gt; actually earns its keep.&lt;/strong&gt; Most batches don&amp;rsquo;t need it; they control lock order through the submission order. The one shape where it&amp;rsquo;s the right answer is a single UPDATE statement over a derived table where the engine decides scan order and you can&amp;rsquo;t control it from outside:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Here, sorting the VALUES list in application code doesn&amp;rsquo;t reliably control lock order; the planner picks the scan. A &lt;code&gt;SELECT id FROM accounts WHERE id IN (...) ORDER BY id FOR UPDATE&lt;/code&gt; up front pre-acquires locks in deterministic order before the UPDATE runs. Or refactor into shape 1 or 2.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;ORDER BY controls result order, not scan order&lt;/strong&gt;
 &lt;div&gt;&lt;code&gt;ORDER BY&lt;/code&gt; on a &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; controls the result ordering, but lock acquisition happens during the scan. With a primary-key or unique-index predicate (&lt;code&gt;WHERE id IN (...)&lt;/code&gt; on a PK), the planner does ordered index lookups and locks land in ORDER BY order in practice. For non-indexed predicates or range scans on non-unique columns, the planner may scan in a different order and sort results afterward; locks get acquired in scan order. Verify with &lt;code&gt;EXPLAIN&lt;/code&gt; before relying on this pattern against non-PK predicates. Also: MySQL&amp;rsquo;s &lt;code&gt;UPDATE ... ORDER BY&lt;/code&gt; syntax applies one SET clause to all matching rows; it doesn&amp;rsquo;t help when rows need different values.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This sounds trivial. It almost never is in practice; the ordering has to hold across every code path that writes to the same tables: the main request handler, backfill scripts, admin scripts, scheduled jobs, ORM bulk-save methods, and whatever migration scripts run during releases. One path that writes in a different order is enough to reopen the cycle. The durable fix is encoding the order in the access layer so individual query sites can&amp;rsquo;t diverge from it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A repository function that always sorts by PK before writing.&lt;/li&gt;
&lt;li&gt;A stored procedure or database function that owns the multi-row write.&lt;/li&gt;
&lt;li&gt;A service method with the ordering baked in, and a lint rule or review check forbidding direct table writes from elsewhere.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two places this invariant breaks without anyone noticing:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ORM bulk-save methods.&lt;/strong&gt; ORMs hide whether they process in input order or reorder internally. Django&amp;rsquo;s &lt;code&gt;Model.objects.bulk_update&lt;/code&gt;, SQLAlchemy&amp;rsquo;s &lt;code&gt;bulk_update_mappings&lt;/code&gt;, ActiveRecord&amp;rsquo;s &lt;code&gt;upsert_all&lt;/code&gt;, Hibernate&amp;rsquo;s batch inserts; some process in input order, some sort by PK internally, some chunk before doing either. If you can&amp;rsquo;t tell from docs, test it: two concurrent bulk-saves with opposing-ordered input lists will either deadlock (proving input order matters) or not (proving the ORM sorts internally). Either way, sorting the input collection before handing it to the ORM is cheap insurance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Batches sourced from a &lt;code&gt;SELECT&lt;/code&gt; in the same transaction.&lt;/strong&gt; A common pattern: &amp;ldquo;grab N pending rows, process them.&amp;rdquo; If the feeder SELECT doesn&amp;rsquo;t have an &lt;code&gt;ORDER BY&lt;/code&gt;, rows come back in scan order, non-deterministic across workers, which reopens the cycle. The fix is an explicit &lt;code&gt;ORDER BY&lt;/code&gt; on the feeder query, not in the subsequent loop:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Bad: scan order feeds the loop.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;SELECT id, delta FROM pending WHERE status = &amp;#39;ready&amp;#39; LIMIT 100&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Whatever the scan produced; varies across workers.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Good: deterministic order, same across every worker.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;SELECT id, delta FROM pending WHERE status = &amp;#39;ready&amp;#39; ORDER BY id LIMIT 100&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The lint-rule angle matters more than it sounds. Deadlock-ordering violations are almost impossible to catch in code review - two PRs that each look correct in isolation can introduce inconsistent ordering when combined. The check that actually works is structural: no direct writes to tables X, Y from anywhere except the repository. Once that invariant is enforced, the ordering invariant follows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-table transactions need the same rule applied to table order.&lt;/strong&gt; A transaction that updates &lt;code&gt;users&lt;/code&gt; then &lt;code&gt;orders&lt;/code&gt; in one code path, and &lt;code&gt;orders&lt;/code&gt; then &lt;code&gt;users&lt;/code&gt; in another, can deadlock through the FK chain even with per-table row ordering. The rule generalizes: sort rows within a table, and sort tables within a transaction, by a stable convention the whole codebase agrees on (alphabetical is fine; just pick one).&lt;/p&gt;
&lt;h2 id="isolation-level-trade-offs"&gt;Isolation-level trade-offs
&lt;/h2&gt;&lt;p&gt;The isolation level a workload runs under determines which deadlock categories even apply. Most MySQL deadlock incidents stem from &lt;code&gt;REPEATABLE READ&lt;/code&gt;&amp;rsquo;s gap locks, a category that doesn&amp;rsquo;t exist on PostgreSQL or on MySQL at &lt;code&gt;READ COMMITTED&lt;/code&gt;. Changing the isolation level is the single highest-leverage tuning lever, and also the one with the most potential to quietly break application correctness.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dropping MySQL from &lt;code&gt;REPEATABLE READ&lt;/code&gt; to &lt;code&gt;READ COMMITTED&lt;/code&gt;.&lt;/strong&gt; Under &lt;code&gt;READ COMMITTED&lt;/code&gt;, InnoDB still takes row locks but skips most gap locks (they exist only for unique-key and FK enforcement on inserts). Most OLTP workloads don&amp;rsquo;t need &lt;code&gt;REPEATABLE READ&lt;/code&gt;&amp;rsquo;s range-consistency guarantee. Most application code was designed around &lt;code&gt;READ COMMITTED&lt;/code&gt; semantics anyway, because that&amp;rsquo;s what PostgreSQL and SQL Server default to. Teams migrating to &lt;code&gt;READ COMMITTED&lt;/code&gt; typically see deadlock rates drop by an order of magnitude with no functional change.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Avoiding range locks on write paths.&lt;/strong&gt; Independent of isolation level, replacing &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; scans over ranges with point lookups by primary key removes the gap-lock surface entirely on the statements that do it. If a write path doesn&amp;rsquo;t need to lock &amp;ldquo;all orders for customer 5,&amp;rdquo; locking just the specific row it&amp;rsquo;s about to update is both faster and safer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FK shared locks are shorter-lived under &lt;code&gt;READ COMMITTED&lt;/code&gt;.&lt;/strong&gt; The &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#foreign-keys-take-shared-locks-you-didnt-ask-for" &gt;foreign-key shared-lock pattern from Part 1&lt;/a&gt; (high-write child tables concentrating shared locks on hot parent rows) has a narrower window under &lt;code&gt;READ COMMITTED&lt;/code&gt; simply because the lock lifespan is tied to the statement rather than the transaction. The cycle potential is still there, but the wait window is smaller.&lt;/p&gt;
&lt;div class="warning-box"&gt;
 &lt;strong&gt;Isolation change is a behavior change, not a tuning knob&lt;/strong&gt;
 &lt;div&gt;&lt;code&gt;READ COMMITTED&lt;/code&gt; eliminates most gap locks but also changes the visibility semantics of long-running transactions. Any code that relied on re-reading a row and getting the same result (transfer logic, inventory deductions, financial calculations) has to be re-examined. The safe migration is application-by-application, not database-wide. Run it in staging under production-like load and watch for subtle correctness regressions: &amp;ldquo;phantom&amp;rdquo; rows appearing inside a transaction that used to see a stable snapshot, inventory counts that shift mid-transaction, calculations that no longer match because an underlying row changed between reads.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Session-scoped change as a migration path.&lt;/strong&gt; Both engines let you set isolation level per session or per transaction (&lt;code&gt;SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED&lt;/code&gt; in MySQL, &lt;code&gt;SET TRANSACTION ISOLATION LEVEL READ COMMITTED&lt;/code&gt; per transaction in PostgreSQL). The usual migration pattern is to start with the most contended code paths, move them to session-scoped &lt;code&gt;READ COMMITTED&lt;/code&gt;, monitor for regressions, then expand the scope. A global flip from &lt;code&gt;REPEATABLE READ&lt;/code&gt; to &lt;code&gt;READ COMMITTED&lt;/code&gt; on a large, stable MySQL deployment is rarely the right first step.&lt;/p&gt;
&lt;h2 id="hot-row-isolation-removing-the-pattern-instead-of-retrying-it"&gt;Hot-row isolation: removing the pattern instead of retrying it
&lt;/h2&gt;&lt;p&gt;When the top N deadlocks on a production system concentrate on a small set of rows (a counter, a config row, an &lt;code&gt;AUTO_INCREMENT&lt;/code&gt; source of truth), retries don&amp;rsquo;t converge. Every retry hits the same row, takes the same lock, cycles with the same peers. The fix is removing the row from the hot path, not tuning the retry layer.&lt;/p&gt;
&lt;p&gt;Three patterns that work:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Counter tables with sharding&lt;/strong&gt;, for extreme write-hot counters only. Reach for this only when the counter is taking thousands of writes per second against a single row and the simpler options below aren&amp;rsquo;t viable. For anything less, the queue pattern below or an external store (Redis atomic &lt;code&gt;INCR&lt;/code&gt;, a time-series DB) is almost always the better answer: less complexity, no schema overhead, no sum-across-rows read cost. Sharded counters are the specialized escalation, not the default.&lt;/p&gt;
&lt;p&gt;When it does fit the workload: N physical shards per logical counter, keyed on a compact integer &lt;code&gt;(counter_id, shard_id)&lt;/code&gt; composite. Application code picks the shard. Keeping the random choice out of SQL makes it portable across engines and testable independently:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_shards&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- FK to a counters metadata table if you need names
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shard_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;SMALLINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- 0..N-1, fixed per-counter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shard_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Seed the shards once per counter (e.g., when the counter is created):
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_shards&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shard_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Increment: application picks the shard. Portable, cheap, no SQL-side RAND().&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;shard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;UPDATE counter_shards SET value = value + 1 &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;WHERE counter_id = &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt; AND shard_id = &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shard&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Read: sum across shards.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;SELECT COALESCE(SUM(value), 0) FROM counter_shards WHERE counter_id = &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;16 shards turn one hot row into 16 warm rows. The contention surface scales with shard count. The read cost is one &lt;code&gt;SUM&lt;/code&gt; across N rows instead of a single-row &lt;code&gt;SELECT&lt;/code&gt;, usually acceptable for counter use cases; if not, cache the aggregate.&lt;/p&gt;
&lt;p&gt;A common refinement is deriving the shard deterministically from the connection or worker ID (e.g., &lt;code&gt;connection_id % 16&lt;/code&gt;) so each worker consistently hits the same shard. That improves InnoDB buffer-pool locality and reduces cross-shard interference, at the cost of slightly less even distribution if worker counts aren&amp;rsquo;t balanced.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advisory locks for app-level serialization.&lt;/strong&gt; Both MySQL and PostgreSQL support advisory locks: named locks that exist outside the table model and don&amp;rsquo;t take row locks. For operations that need to be serialized at the application level (leader election, rate limiting, config migration), advisory locks are dramatically cheaper than row locks and can&amp;rsquo;t participate in a table-lock cycle:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: advisory lock keyed by a bigint.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_advisory_xact_lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashtext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;refresh_cache:customer_42&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Lock released at transaction end. Only one worker per key runs at a time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL equivalent:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GET_LOCK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;refresh_cache:customer_42&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Returns 1 if acquired, 0 on timeout. Must explicitly RELEASE_LOCK.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The caveats: advisory locks are application-layer discipline; they don&amp;rsquo;t enforce anything the database checks. Use them where the application chooses to serialize, not where correctness requires it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Queue patterns instead of direct updates.&lt;/strong&gt; For counter-like workloads, write intents to an append-only table and aggregate periodically:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_events&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- No contention: every insert creates a new row.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Periodic aggregation job:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_totals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_events&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FALSE&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_key&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;counter_totals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXCLUDED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Trades real-time accuracy for throughput. The right trade-off for page-view counters, metric accumulation, any workload where eventual consistency is acceptable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hot parent rows behind FK chains.&lt;/strong&gt; &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#foreign-keys-take-shared-locks-you-didnt-ask-for" &gt;Part 1&lt;/a&gt; described how a high-write child table concentrates shared locks on a hot parent row, and how any exclusive-lock request on that parent (a name update, a soft-delete, a trigger-driven counter update) becomes a contention point. Two levers that work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Move high-frequency parent-row updates to a side table.&lt;/strong&gt; The &lt;code&gt;last_activity_at&lt;/code&gt; timestamp, the cached counter, the &lt;code&gt;updated_at&lt;/code&gt; that a trigger bumps on every child insert; none of these need to live on the parent table. Moving them to &lt;code&gt;customer_activity(customer_id, last_seen_at)&lt;/code&gt; or &lt;code&gt;customer_counters(customer_id, order_count)&lt;/code&gt; eliminates the exclusive-lock contention entirely. The parent row stops changing on hot paths, the shared locks from FK checks coexist fine, and the cycle potential disappears.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Narrow the FK scope where integrity can tolerate it.&lt;/strong&gt; Not every child table needs an enforced FK to every parent. Logs, events, and audit tables are often the biggest offenders, and often have the least need for strict integrity (an orphaned log row is rarely a correctness problem). Dropping the FK removes the shared-lock dependency entirely. This trades integrity for throughput, a decision that belongs with the team owning the data, not a default.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Under &lt;code&gt;READ COMMITTED&lt;/code&gt;, both levers matter less because the FK shared locks release at statement end rather than transaction end. A workload that runs on &lt;code&gt;REPEATABLE READ&lt;/code&gt; and can&amp;rsquo;t change isolation level (because of application semantics) gets the most benefit from these two fixes.&lt;/p&gt;
&lt;h2 id="long-running-transactions-are-a-deadlock-amplifier"&gt;Long-running transactions are a deadlock amplifier
&lt;/h2&gt;&lt;p&gt;The longer a transaction holds locks, the wider the window for a cycle. Two patterns recur in production:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Application-layer long transactions.&lt;/strong&gt; A transaction that opens at request start, makes several queries, calls an external API, then commits. The external call is where the transaction actually spends its time: seconds of open transaction holding row locks the whole time. Every concurrent transaction that touches those rows waits. Deadlock probability scales with transaction duration. The fix is the inverse of the outbox pattern: do the external call outside the transaction, passing in any needed IDs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Idle-in-transaction sessions.&lt;/strong&gt; A session that runs &lt;code&gt;BEGIN&lt;/code&gt;, some writes, then stalls; idle but not committed. In PostgreSQL, this blocks vacuum on touched tables, bloats MVCC, and holds locks indefinitely. &lt;code&gt;pg_stat_activity&lt;/code&gt; shows &lt;code&gt;state = 'idle in transaction'&lt;/code&gt;. MySQL&amp;rsquo;s equivalent is a thread with an open transaction and no current query.&lt;/p&gt;
&lt;p&gt;PostgreSQL has a first-class timeout for this; MySQL does not:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: kill idle-in-transaction sessions after 5 minutes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- (Units required; bare integer would be interpreted as milliseconds.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idle_in_transaction_session_timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;5min&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;MySQL has no direct equivalent. &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_wait_timeout" target="_blank" rel="noopener"
 &gt;&lt;code&gt;wait_timeout&lt;/code&gt;&lt;/a&gt; and &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_interactive_timeout" target="_blank" rel="noopener"
 &gt;&lt;code&gt;interactive_timeout&lt;/code&gt;&lt;/a&gt; govern idle connections, not sessions idle inside an open transaction. A connection that did &lt;code&gt;BEGIN&lt;/code&gt; then stopped sending queries will hold its locks until the connection drops or the client commits. The production workaround is either a watchdog script (e.g., Percona&amp;rsquo;s &lt;code&gt;pt-kill&lt;/code&gt;) that polls &lt;code&gt;information_schema.innodb_trx&lt;/code&gt; and terminates transactions exceeding a duration threshold, or a connection pool with per-connection transaction lifetimes. Connection pools that acquire a connection, start a transaction, then return the connection to the pool without committing (rare but real) will produce sessions that live indefinitely otherwise.&lt;/p&gt;
&lt;p&gt;Finding long-running transactions:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;usename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;INTERVAL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;30 seconds&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_started&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_mysql_thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_rows_locked&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_query&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;innodb_trx&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SECOND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trx_started&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Alerting on any transaction exceeding 30s in an OLTP workload catches most of the long-transaction-induced deadlocks before they produce incidents.&lt;/p&gt;
&lt;h2 id="triggers-and-cascades-are-invisible-lock-sources"&gt;Triggers and cascades are invisible lock sources
&lt;/h2&gt;&lt;p&gt;A trigger that updates a second table on every write to the first adds an edge to the wait-for graph that isn&amp;rsquo;t visible in the original query. &lt;code&gt;ON DELETE CASCADE&lt;/code&gt; foreign keys behave similarly - one delete can take locks on every child row in the cascade, and if the cascade order differs between two concurrent deletes, they can deadlock through tables neither statement directly referenced.&lt;/p&gt;
&lt;p&gt;This is the origin of the &amp;ldquo;why is my &lt;code&gt;DELETE FROM users&lt;/code&gt; deadlocking against an &lt;code&gt;INSERT INTO events&lt;/code&gt;?&amp;rdquo; question. The &lt;code&gt;DELETE&lt;/code&gt; triggered a cascade to &lt;code&gt;user_preferences&lt;/code&gt;, which had a trigger that updated a counter in &lt;code&gt;tenants&lt;/code&gt;, which was locked by the &lt;code&gt;INSERT&lt;/code&gt;. Four tables in the cycle, two in the application&amp;rsquo;s explicit query, zero mention of the other two in any log entry until someone reads the DDL.&lt;/p&gt;
&lt;p&gt;The operational pattern: when a deadlock log mentions a table the application&amp;rsquo;s code doesn&amp;rsquo;t explicitly reference, check (1) FK cascades on the tables that are in the query, (2) triggers on those tables, (3) generated columns that fire on update. All three are non-obvious lock sources, all three are fixable, but only after they&amp;rsquo;re identified.&lt;/p&gt;
&lt;h2 id="innodb_autoinc_lock_mode-and-the-auto-inc-lock"&gt;&lt;code&gt;innodb_autoinc_lock_mode&lt;/code&gt; and the AUTO-INC lock
&lt;/h2&gt;&lt;p&gt;MySQL InnoDB&amp;rsquo;s &lt;code&gt;AUTO_INCREMENT&lt;/code&gt; column has its own lock, historically a source of contention and occasional deadlock. The &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_autoinc_lock_mode" target="_blank" rel="noopener"
 &gt;&lt;code&gt;innodb_autoinc_lock_mode&lt;/code&gt;&lt;/a&gt; parameter controls the behavior:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mode 0 (traditional).&lt;/strong&gt; Table-level AUTO-INC lock held for the duration of the statement. Serialized across inserts. Safe for statement-based replication, terrible for concurrency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mode 1 (consecutive).&lt;/strong&gt; A lighter lock for simple inserts (single-row or known-row-count), and the traditional table lock for bulk inserts (&lt;code&gt;INSERT ... SELECT&lt;/code&gt;). Was the default in 5.7.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mode 2 (interleaved).&lt;/strong&gt; No AUTO-INC table lock; IDs are assigned per-row as needed, possibly interleaved across concurrent statements. Default in MySQL 8.0. Fastest, and correct for row-based replication (which is also the 8.0 default).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The mode-2 default in 8.0 eliminated a substantial source of historical deadlocks and contention. Bulk inserts that used to serialize on the AUTO-INC lock now proceed in parallel. If you&amp;rsquo;re migrating from 5.7 to 8.0, this is a free win. If you&amp;rsquo;re still on &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_format" target="_blank" rel="noopener"
 &gt;&lt;code&gt;binlog_format&lt;/code&gt;&lt;/a&gt;&lt;code&gt; = STATEMENT&lt;/code&gt; (uncommon but not unheard of in legacy deployments), you cannot safely run mode 2; the replica may generate different IDs than the source, corrupting the data. Switch to &lt;code&gt;binlog_format = ROW&lt;/code&gt; first, then adopt mode 2.&lt;/p&gt;
&lt;h2 id="ddl-migration-windows-and-lock_timeout"&gt;DDL migration windows and &lt;code&gt;lock_timeout&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Online schema change isn&amp;rsquo;t deadlock-prone in the classical sense, but it interacts with deadlocks in a specific operational way: DDL takes heavy locks that queue behind ongoing DML, and while the DDL waits, every subsequent query on that table queues behind the DDL. In PostgreSQL, a DDL taking &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; that waits for an existing long-running &lt;code&gt;SELECT&lt;/code&gt; will cause every new &lt;code&gt;SELECT&lt;/code&gt; to wait behind the DDL. The system grinds to a halt, and application logs fill with timeout errors that look like deadlocks but aren&amp;rsquo;t. It&amp;rsquo;s a queue, not a cycle.&lt;/p&gt;
&lt;p&gt;The standard prevention idiom for PostgreSQL migrations:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lock_timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2s&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COLUMN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;notes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the &lt;code&gt;ALTER&lt;/code&gt; can&amp;rsquo;t acquire its lock in 2 seconds, it fails instead of queueing. The migration tool catches the error and retries with backoff. This prevents the queue-behind-DDL outage entirely - the cost is that some migrations need multiple attempts to land, which is almost always the right trade-off.&lt;/p&gt;
&lt;p&gt;MySQL&amp;rsquo;s equivalent tooling is &lt;code&gt;pt-online-schema-change&lt;/code&gt; (Percona) and &lt;code&gt;gh-ost&lt;/code&gt; (GitHub). Both create a copy of the table, stream writes to both via trigger or binlog, and swap at the end. They run concurrent DML against the original and the copy, so they inflate deadlock rates during the migration window: not because the tool is buggy, but because there are now more transactions touching the same rows. The operational practice: run migrations at low-traffic windows, watch the deadlock counter during the run (not just replica lag), and have a rollback path ready.&lt;/p&gt;
&lt;div class="warning-box"&gt;
 &lt;strong&gt;DDL inside transactions is engine-dependent&lt;/strong&gt;
 &lt;div&gt;PostgreSQL supports transactional DDL: &lt;code&gt;BEGIN; ALTER TABLE ...; COMMIT;&lt;/code&gt; is atomic. MySQL does not; every DDL statement implicitly commits the current transaction. A migration script that assumes it can roll back mid-migration works on PostgreSQL and silently half-applies on MySQL. Know which engine you&amp;rsquo;re writing migrations for.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="nowait-and-skip-locked-as-prevention-primitives"&gt;&lt;code&gt;NOWAIT&lt;/code&gt; and &lt;code&gt;SKIP LOCKED&lt;/code&gt; as prevention primitives
&lt;/h2&gt;&lt;p&gt;Both engines support two SQL-level concurrency primitives that remove the need for application-layer deadlock handling in specific patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;SELECT ... FOR UPDATE NOWAIT&lt;/code&gt;.&lt;/strong&gt; If the row is locked by another transaction, fail immediately with an error instead of waiting. Useful for user-facing paths where &amp;ldquo;I can&amp;rsquo;t get this resource right now&amp;rdquo; is a better UX than &amp;ldquo;wait 500ms and maybe deadlock anyway.&amp;rdquo; Also useful for detecting lock contention synthetically in tests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;SELECT ... FOR UPDATE SKIP LOCKED&lt;/code&gt;.&lt;/strong&gt; If rows are locked by another transaction, skip them and return only rows the current transaction can lock. Transforms a contended queue-processor pattern into a lock-free one: N workers each grab a different set of rows, zero contention, zero deadlocks.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Queue processor: deadlock-free, contention-free.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;pending&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LIMIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FOR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SKIP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOCKED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Fast-fail acquisition: don&amp;#39;t wait, fail now.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leader_election&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resource_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;cache-refresh&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FOR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NOWAIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;SKIP LOCKED&lt;/code&gt; arrived in PostgreSQL 9.5 and MySQL 8.0. Before those versions, queue-processor patterns required either advisory locks or application-level coordination (Redis, Zookeeper). Post-&lt;code&gt;SKIP LOCKED&lt;/code&gt;, they can live entirely in the database with a single primitive. For any workload where workers pull from a shared queue, this is the pattern - not retry loops on &lt;code&gt;FOR UPDATE&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="monitoring-that-actually-catches-regressions"&gt;Monitoring that actually catches regressions
&lt;/h2&gt;&lt;p&gt;The single most useful metric is deadlock rate over time. Not error rate, not retry rate; the raw count of deadlocks per minute or per thousand transactions. A workload with 0.1 deadlocks per thousand transactions is healthy; 10 per thousand is a paging threshold; 100 per thousand means retries aren&amp;rsquo;t converging and something is structurally wrong.&lt;/p&gt;
&lt;p&gt;For MySQL: there&amp;rsquo;s no &lt;code&gt;Innodb_deadlocks&lt;/code&gt; status variable. The correct source is &lt;code&gt;performance_schema.events_errors_summary_global_by_error&lt;/code&gt;, which is enabled by default:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Cumulative deadlock count (compare over time windows).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SUM_ERROR_RAISED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deadlock_count&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;performance_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events_errors_summary_global_by_error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ERROR_NAME&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ER_LOCK_DEADLOCK&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Lock-wait activity (useful for adjacent contention, NOT a deadlock counter):
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GLOBAL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;STATUS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LIKE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Innodb_row_lock_waits&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Plus the error log, searchable for &amp;#34;LATEST DETECTED DEADLOCK&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- once innodb_print_all_deadlocks=ON.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;Innodb_row_lock_waits&lt;/code&gt; is commonly misread as a deadlock counter - &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/server-status-variables.html#statvar_Innodb_row_lock_waits" target="_blank" rel="noopener"
 &gt;the manual&lt;/a&gt; defines it as &amp;ldquo;the number of times operations on InnoDB tables had to wait for a row lock,&amp;rdquo; which is contention in general. Pair it with the &lt;code&gt;events_errors_summary&lt;/code&gt; query, not in place of it.&lt;/p&gt;
&lt;p&gt;For PostgreSQL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- pg_stat_database exposes per-database deadlock counter.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;datname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deadlocks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_commit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;xact_rollback&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_stat_database&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;datname&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current_database&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Scrape both into Prometheus (mysqld_exporter and postgres_exporter both expose these), compute the rate, alert on sustained rises. Pair the deadlock rate with a &lt;em&gt;retry rate&lt;/em&gt; from the application layer - a spike in one without the other means either the retry logic is broken or the workload shape changed. A spike in both means a real regression.&lt;/p&gt;
&lt;p&gt;Beyond the rate itself, the top-K pairs of statements involved in deadlocks (extracted from &lt;code&gt;innodb_print_all_deadlocks&lt;/code&gt; logs or PG&amp;rsquo;s deadlock log entries) identify exactly which code paths are fighting. This list rarely changes - the same two or three patterns account for most deadlocks in any given system. Fix those and the rate drops by an order of magnitude.&lt;/p&gt;
&lt;h2 id="the-mental-model-for-part-2"&gt;The mental model for Part 2
&lt;/h2&gt;&lt;p&gt;Part 1&amp;rsquo;s patterns answer why deadlocks happen. Part 2&amp;rsquo;s operations answer what to do about them, and the useful reframe is that the answer is almost never &amp;ldquo;tune the retry logic.&amp;rdquo; Retries are the recovery mechanism that keeps the application running while the actual fix lands. The actual fix is almost always one of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identify the pattern from the log. This is step zero; skipping it means you&amp;rsquo;re tuning blind.&lt;/li&gt;
&lt;li&gt;Enforce consistent lock ordering at the access-layer level. Highest-leverage fix for the lock-ordering pattern; deterministically eliminates the cycle rather than shrinking its window.&lt;/li&gt;
&lt;li&gt;Change the code path to use &lt;code&gt;SKIP LOCKED&lt;/code&gt; / &lt;code&gt;NOWAIT&lt;/code&gt; where the pattern matches (queue processors, resource acquisition).&lt;/li&gt;
&lt;li&gt;Isolate hot rows (counter tables, shards, advisory locks, queue patterns; move high-frequency parent-row updates to side tables).&lt;/li&gt;
&lt;li&gt;Shorten transactions. Move external calls out, enforce idle-transaction timeouts.&lt;/li&gt;
&lt;li&gt;Drop isolation level where the workload allows it; session-scoped first, global only after regression testing. Eliminates the gap-lock category on MySQL.&lt;/li&gt;
&lt;li&gt;Remove cascades/triggers from the hot path when they&amp;rsquo;re the hidden lock source.&lt;/li&gt;
&lt;li&gt;Handle &lt;code&gt;SERIALIZABLE&lt;/code&gt;&amp;rsquo;s 40001 as a normal event if you&amp;rsquo;re on &lt;code&gt;SERIALIZABLE&lt;/code&gt;, and don&amp;rsquo;t confuse it with 40P01.&lt;/li&gt;
&lt;li&gt;Plan DDL windows with &lt;code&gt;lock_timeout&lt;/code&gt; and watch the deadlock counter through the migration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Retries let the application survive while the fix is in flight. Monitoring tells you which fix to prioritize. Each of the above removes a category from the workload entirely. The goal is a system where the few remaining deadlocks are rare enough that the retry layer handles them invisibly and the team&amp;rsquo;s attention can go elsewhere. Not zero (that&amp;rsquo;s a theoretical fiction at realistic concurrency), but managed.&lt;/p&gt;</description></item><item><title>Database Deadlocks, Part 1: The Patterns</title><link>https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/</link><pubDate>Thu, 13 Feb 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Database Deadlocks, Part 1: The Patterns" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;A deadlock is two transactions each holding a lock the other needs, caught in a cycle the engine breaks by killing one. The patterns are finite and repeatable: inconsistent lock ordering across workers, InnoDB gap locks under &lt;code&gt;REPEATABLE READ&lt;/code&gt;, foreign-key shared locks on hot parent rows, unique-index conflicts, index-scan lock amplification, and parallelism patterns that only surface on replicas or under worker-pool load. This post is the patterns. &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/" target="_blank" rel="noopener"
 &gt;Part 2&lt;/a&gt; covers diagnosis, retry architecture, and prevention.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Deadlocks occupy a strange place in production operations. They&amp;rsquo;re rare enough that most engineers haven&amp;rsquo;t thought hard about them, frequent enough in high-concurrency workloads to show up as paging incidents, and subtle enough that the first instinct (&amp;ldquo;just retry&amp;rdquo;) is right often enough to keep the root cause hidden. The transaction that got killed was syntactically perfect. The one that survived was too. The bug wasn&amp;rsquo;t in either statement; it was in the order the two transactions touched rows.&lt;/p&gt;
&lt;p&gt;That makes deadlocks harder to reason about than most database failures. The query text in the error log isn&amp;rsquo;t wrong. The lock it was waiting for isn&amp;rsquo;t held by a misbehaving process. The system is doing exactly what concurrency control says it should. The failure mode is the interaction between transactions, and those interactions are almost never visible from any single query.&lt;/p&gt;
&lt;p&gt;This post covers the patterns: the shapes deadlocks take and why each one exists. The &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/" &gt;companion post&lt;/a&gt; covers reading the deadlock log end-to-end, retry architecture, hot-row isolation, SERIALIZABLE&amp;rsquo;s serialization failures, DDL migration windows, and &lt;code&gt;NOWAIT&lt;/code&gt; / &lt;code&gt;SKIP LOCKED&lt;/code&gt; as prevention primitives.&lt;/p&gt;
&lt;h2 id="what-a-deadlock-actually-is"&gt;What a deadlock actually is
&lt;/h2&gt;&lt;p&gt;A deadlock is a cycle in the wait-for graph. Transaction A holds lock L1 and needs L2; transaction B holds L2 and needs L1. Neither can proceed. The only way out is to kill one: pick a victim, roll it back, release its locks, and let the other complete. Every modern relational database does this automatically, usually within hundreds of milliseconds.&lt;/p&gt;
&lt;p&gt;The three preconditions are always the same:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Two or more transactions hold locks.&lt;/li&gt;
&lt;li&gt;Each needs a lock the other holds.&lt;/li&gt;
&lt;li&gt;The locks can&amp;rsquo;t be acquired atomically (there&amp;rsquo;s no single &amp;ldquo;grab both or grab nothing&amp;rdquo; operation).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Remove any one and the deadlock can&amp;rsquo;t form. In practice, that&amp;rsquo;s the shape of every prevention strategy: reduce the number of locks held concurrently, reduce the duration they&amp;rsquo;re held, or make the acquisition order consistent across all code paths.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;Deadlock vs. lock wait timeout&lt;/strong&gt;
 &lt;div&gt;A deadlock is a cycle. A lock wait timeout is a long queue: transaction A is waiting for transaction B, which is waiting for something reasonable, which is taking too long. No cycle, no victim selection, just a timer expiring. Both produce errors that look similar in application logs, but they&amp;rsquo;re entirely different failure modes with different fixes. &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout" target="_blank" rel="noopener"
 &gt;&lt;code&gt;innodb_lock_wait_timeout&lt;/code&gt;&lt;/a&gt; (MySQL) and &lt;a class="link" href="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCK-TIMEOUT" target="_blank" rel="noopener"
 &gt;&lt;code&gt;lock_timeout&lt;/code&gt;&lt;/a&gt; (PostgreSQL) govern the second. The deadlock detector is a separate mechanism that fires independently of those timers.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="the-canonical-lock-ordering-deadlock"&gt;The canonical lock-ordering deadlock
&lt;/h2&gt;&lt;p&gt;The single most common production deadlock is two transactions updating the same two rows in opposite orders:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Transaction A
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Transaction B (concurrent)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If A and B interleave such that A takes the row-level lock on row 1, B takes the row-level lock on row 2, and each then tries to grab the other row, the engine has a cycle. One gets killed.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;sequenceDiagram
 participant A as Transaction A
 participant R1 as Row id=1
 participant R2 as Row id=2
 participant B as Transaction B

 Note over A,B: t=0, both transactions begin
 A-&gt;&gt;R1: UPDATE (acquire X-lock)
 R1--&gt;&gt;A: granted
 B-&gt;&gt;R2: UPDATE (acquire X-lock)
 R2--&gt;&gt;B: granted

 Note over A,B: t=1, each reaches for the other's row
 A-&gt;&gt;R2: UPDATE (request X-lock)
 R2--xA: BLOCKED (held by B)
 B-&gt;&gt;R1: UPDATE (request X-lock)
 R1--xB: BLOCKED (held by A)

 Note over A,B: Wait-for graph has a cycle: A → B → A
 Note over A,B: Detector fires; victim: Transaction B
 B-&gt;&gt;B: ROLLBACK, release R2 lock
 R2--&gt;&gt;A: now granted
 A-&gt;&gt;A: COMMIT&lt;/pre&gt;&lt;p&gt;The key property is that neither transaction is wrong in isolation. Each acquires locks in an order that&amp;rsquo;s locally correct. The cycle forms in the global ordering across concurrent sessions, which no single query can see. That&amp;rsquo;s the defining shape of the pattern: correct code, in both cases, interacting at the transaction boundary. The fix is in how all code paths agree on an ordering, covered in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#consistent-lock-ordering-the-highest-leverage-fix" &gt;Part 2: Consistent lock ordering&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="innodb-gap-locks-turn-inserts-into-deadlock-sources"&gt;InnoDB gap locks turn inserts into deadlock sources
&lt;/h2&gt;&lt;p&gt;MySQL&amp;rsquo;s default isolation level is &lt;code&gt;REPEATABLE READ&lt;/code&gt;, and under that isolation level, InnoDB takes next-key locks: a row lock plus a gap lock on the range before it. The gap lock prevents other transactions from inserting into that range, which is how &lt;code&gt;REPEATABLE READ&lt;/code&gt; keeps range queries consistent across re-execution.&lt;/p&gt;
&lt;p&gt;The consequence: a &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; or an &lt;code&gt;UPDATE&lt;/code&gt; with a range predicate locks not just the matching rows, but the gaps between them. Two concurrent transactions that both try to insert into the same gap can deadlock without sharing a single row:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;span class="lnt"&gt;18
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Table: orders(id BIGINT PK, customer_id BIGINT, amount_cents BIGINT)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Index on customer_id
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Existing rows: customer_id = 5 has orders with ids 100, 200, 300
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Transaction A
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FOR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Takes next-key locks covering ids 100, 200, 300 AND the gaps between them,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- AND the gap after 300 extending to the next customer_id.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Transaction B (concurrent)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Blocks: tries to insert into a gap A has locked.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Transaction A
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Deadlock if B has also started acquiring locks that A now needs.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Two transactions inserting into what looks like &amp;ldquo;different rows&amp;rdquo; can still cycle through gap locks. The failure mode is especially insidious because the &lt;code&gt;EXPLAIN&lt;/code&gt; plan doesn&amp;rsquo;t mention gaps - only rows - and the lock information in &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt; requires reading the next-key notation carefully.&lt;/p&gt;
&lt;p&gt;The category is peculiar to InnoDB under &lt;code&gt;REPEATABLE READ&lt;/code&gt;. PostgreSQL prevents phantom reads through MVCC snapshot isolation starting at its own &lt;code&gt;REPEATABLE READ&lt;/code&gt; level (stricter than the SQL standard requires) without any range-locking mechanism, so the whole class of gap-lock deadlocks doesn&amp;rsquo;t exist on PG, at any isolation level. Under &lt;code&gt;READ COMMITTED&lt;/code&gt; on MySQL, gap locks are disabled for most searches and index scans but retained for foreign-key and duplicate-key checking, which is the first lever most teams reach for once this pattern dominates their deadlocks, though it doesn&amp;rsquo;t eliminate the gap-lock category entirely. The isolation-level trade-off and the &amp;ldquo;avoid range locks on write paths&amp;rdquo; refactor both live in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#isolation-level-trade-offs" &gt;Part 2: Isolation-level trade-offs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="unique-index-deadlocks-are-a-category-of-their-own"&gt;Unique-index deadlocks are a category of their own
&lt;/h2&gt;&lt;p&gt;The detailed patterns are covered in &lt;a class="link" href="https://explainanalyze.com/p/uniqueness-and-selectivity-the-two-numbers-that-drive-query-plans/#unique-indexes-also-concentrate-deadlock-pressure" &gt;Uniqueness and Selectivity&lt;/a&gt;, but the shape worth naming here: when InnoDB detects a duplicate-key error on an &lt;code&gt;INSERT&lt;/code&gt;, it acquires a shared lock on the conflicting index record before raising the error. Under &lt;code&gt;REPEATABLE READ&lt;/code&gt; that shared lock is next-key (record + gap). Under &lt;code&gt;READ COMMITTED&lt;/code&gt;, gap locks are mostly disabled, but duplicate-key checking is one of the documented exceptions where gap locking still occurs, so dropping isolation alone doesn&amp;rsquo;t eliminate the category. Two concurrent transactions inserting toward the same unique key end up holding shared locks and waiting for each other to release: a deadlock caused entirely by the uniqueness check, not by the rows the application thought it was writing.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; behaves differently: on conflict it takes an exclusive lock instead of a shared one, because the statement is about to modify the row. This matters for reasoning about cycles. Two concurrent &lt;code&gt;ODKU&lt;/code&gt; statements contend on exclusive locks (mutually exclusive, one always waits), whereas two concurrent plain &lt;code&gt;INSERT&lt;/code&gt;s can both hold shared locks at once and then deadlock when either tries to upgrade. Blog posts and older documentation sometimes conflate the two; the locking rules are documented in the &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html" target="_blank" rel="noopener"
 &gt;MySQL reference: Locks Set by Different SQL Statements in InnoDB&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The equivalent in PostgreSQL is less severe (the duplicate-key check doesn&amp;rsquo;t hold long-lived shared locks the same way) but &lt;code&gt;INSERT ... ON CONFLICT&lt;/code&gt; with multiple unique indexes can still produce deadlocks when batches touch overlapping keys in different orders. The shape is the same across engines: the uniqueness check itself is what forces the extra locking, and the cycle forms when two sessions approach the same key from different batches.&lt;/p&gt;
&lt;h2 id="foreign-keys-take-shared-locks-you-didnt-ask-for"&gt;Foreign keys take shared locks you didn&amp;rsquo;t ask for
&lt;/h2&gt;&lt;p&gt;Both MySQL and PostgreSQL acquire shared locks on the referenced row when you insert or update a row with a foreign key. The purpose is to prevent the referenced row from being deleted mid-transaction; you can&amp;rsquo;t have an &lt;code&gt;orders.customer_id&lt;/code&gt; pointing to a &lt;code&gt;customers.id&lt;/code&gt; that&amp;rsquo;s being concurrently deleted.&lt;/p&gt;
&lt;p&gt;The side effect is that a high-write child table concentrates shared locks on hot parent rows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- customers has id = 42 (a frequently-used customer)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Many concurrent transactions inserting orders for customer 42:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Takes a shared lock on customers(id=42)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Shared locks don&amp;rsquo;t block each other, so concurrent inserts coexist fine. What breaks is the interaction with any transaction that wants an exclusive lock on the parent row: an update to the customer&amp;rsquo;s name, a soft-delete, a trigger that updates a cached counter. Suddenly, dozens of shared-lock holders are blocking one exclusive-lock request, and if any of them start trying to acquire other locks (say, through a trigger cascade), a cycle can form.&lt;/p&gt;
&lt;p&gt;The symptoms: deadlocks that mention tables far removed from the one the application thought it was touching. &amp;ldquo;Why is my &lt;code&gt;UPDATE customers&lt;/code&gt; deadlocking against an &lt;code&gt;INSERT INTO order_items&lt;/code&gt;?&amp;rdquo; Because the &lt;code&gt;order_items&lt;/code&gt; insert took a shared lock on &lt;code&gt;customers&lt;/code&gt; through the FK chain, and the &lt;code&gt;UPDATE&lt;/code&gt; wanted exclusive on the same row.&lt;/p&gt;
&lt;p&gt;This is one of the hardest patterns to diagnose on sight, because the offending query never references the contended table explicitly. Mitigations (narrowing FK scope, moving hot parent-row updates to side tables, isolation-level trade-offs) are covered in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#hot-row-isolation-removing-the-pattern-instead-of-retrying-it" &gt;Part 2: Hot-row isolation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="index-scans-lock-more-rows-than-queries-return"&gt;Index scans lock more rows than queries return
&lt;/h2&gt;&lt;p&gt;Under InnoDB&amp;rsquo;s default &lt;code&gt;REPEATABLE READ&lt;/code&gt;, an &lt;code&gt;UPDATE&lt;/code&gt; with a &lt;code&gt;WHERE&lt;/code&gt; clause on a non-indexed column acquires a record lock on every row it scans, not just the ones that match. The engine has to examine each row to check the predicate, and it takes a lock to guarantee the check is stable for the duration of the transaction.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Without an index on status, under REPEATABLE READ:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;high&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;pending&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Locks every row in orders during the scan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the table has a million rows and only a thousand match, all million get locked for the duration of the update. Any concurrent transaction touching any of those rows has to wait, which inflates the wait-for graph and makes deadlocks more likely.&lt;/p&gt;
&lt;p&gt;Under &lt;code&gt;READ COMMITTED&lt;/code&gt;, InnoDB narrows this substantially: &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html" target="_blank" rel="noopener"
 &gt;per the docs&lt;/a&gt;, it releases locks on non-matching rows after the &lt;code&gt;WHERE&lt;/code&gt; evaluation and uses semi-consistent reads, returning the latest committed version of an already-locked row so the engine can check whether it matches the &lt;code&gt;WHERE&lt;/code&gt; before deciding to wait. The net effect is much lower lock footprint and deadlock risk on the same query. PostgreSQL behaves similarly by default: only rows actually updated retain their locks. This is one of the few cases where the same underlying issue (an unindexed predicate) shows up as both a latency problem and a concurrency problem, and where the concurrency angle is specifically a &lt;code&gt;REPEATABLE READ&lt;/code&gt;-on-InnoDB amplifier.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;Secondary index locks on InnoDB&lt;/strong&gt;
 &lt;div&gt;InnoDB takes locks on both the clustered index (primary key) and any secondary indexes touched by the query. A &lt;code&gt;WHERE status = 'pending'&lt;/code&gt; using a &lt;code&gt;status&lt;/code&gt; index locks the relevant index entries and the corresponding PK entries. Transactions that approach the same rows from different indexes (one via &lt;code&gt;status&lt;/code&gt;, another via &lt;code&gt;customer_id&lt;/code&gt;) can deadlock on the PK-side lock even though their index-side locks don&amp;rsquo;t overlap. This is the most common &amp;ldquo;why are these two queries deadlocking, they don&amp;rsquo;t even reference the same columns?&amp;rdquo; failure mode.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="parallelism-induced-deadlocks"&gt;Parallelism-induced deadlocks
&lt;/h2&gt;&lt;p&gt;The lock-ordering patterns above assume two separate transactions from two separate sessions. Parallelism adds a few variants that don&amp;rsquo;t fit that frame; the cycle can form inside a single logical unit of work, or show up only on a replica that never issued the original statements.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Worker pools racing on a shared queue.&lt;/strong&gt; The archetypal production pattern: N application workers pulling jobs from the same table (&lt;code&gt;jobs&lt;/code&gt;, &lt;code&gt;outbox&lt;/code&gt;, &lt;code&gt;email_queue&lt;/code&gt;) and locking rows for processing. If every worker does &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; on &amp;ldquo;the next available batch&amp;rdquo; without a deterministic ordering, two workers can grab overlapping row sets in opposite orders and cycle. This is the same lock-ordering cycle from earlier, distributed across workers that all look identical from a code-review perspective.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Intra-query parallel workers.&lt;/strong&gt; PostgreSQL has a full parallel query executor (parallel sequential scans, bitmap heap scans, index and index-only scans (B-tree), parallel aggregates, parallel joins) that spawns worker processes to cooperate on a single query. MySQL has a much narrower feature: &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_parallel_read_threads" target="_blank" rel="noopener"
 &gt;&lt;code&gt;innodb_parallel_read_threads&lt;/code&gt;&lt;/a&gt; (added in 8.0.14) enables parallel scanning of the clustered index, used initially by &lt;code&gt;CHECK TABLE&lt;/code&gt; and extended to unconditional &lt;code&gt;SELECT COUNT(*)&lt;/code&gt; in 8.0.17. It is not general parallel query; MySQL does not parallelize arbitrary &lt;code&gt;SELECT&lt;/code&gt;s, joins, or aggregates. In both engines, workers coordinating on a single query don&amp;rsquo;t deadlock among themselves in normal operation; the engine manages the shared lock state. What can happen is a parallel worker holds a lock an unrelated transaction needs, and the parallel query itself takes longer than a serial one would, widening the wait window. Usually not a direct deadlock source, but it changes the timing of existing ones.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel replication on replicas.&lt;/strong&gt; MySQL&amp;rsquo;s multi-threaded replica applies committed transactions in parallel. Transactions that committed serially on the source (no possibility of deadlock there) can deadlock on the replica because the applier threads are racing on rows the source never had concurrent writers on. The replica&amp;rsquo;s deadlock detector resolves them the same way it would a live deadlock, but they show up in the replica&amp;rsquo;s error log with no corresponding entry on the source. Since MySQL 8.0.27, &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_replica_parallel_type" target="_blank" rel="noopener"
 &gt;&lt;code&gt;replica_parallel_type&lt;/code&gt;&lt;/a&gt;&lt;code&gt;=LOGICAL_CLOCK&lt;/code&gt; and &lt;a class="link" href="https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_replica_parallel_workers" target="_blank" rel="noopener"
 &gt;&lt;code&gt;replica_parallel_workers&lt;/code&gt;&lt;/a&gt;&lt;code&gt;=4&lt;/code&gt; are the defaults, and &lt;code&gt;replica_parallel_type&lt;/code&gt; was deprecated in 8.0.29; &lt;code&gt;LOGICAL_CLOCK&lt;/code&gt; is effectively the only supported mode going forward. The &lt;code&gt;slave_*&lt;/code&gt; → &lt;code&gt;replica_*&lt;/code&gt; rename happened in 8.0.26; older deployments and blog posts still use the legacy names. PostgreSQL 16+ introduced parallel apply for logical replication (&lt;code&gt;streaming = parallel&lt;/code&gt; is the default on &lt;code&gt;CREATE SUBSCRIPTION&lt;/code&gt;), which exposes the same class of apply-side cycles on a setup that historically didn&amp;rsquo;t have them: a surprise for teams upgrading from 15 and earlier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel/online DDL interacting with DML.&lt;/strong&gt; Tools like &lt;code&gt;pt-online-schema-change&lt;/code&gt; and &lt;code&gt;gh-ost&lt;/code&gt; run concurrent DML against the table being altered (through triggers or a row-copy process). Under load, the trigger-installed writes and the copy process can both take locks on the same rows the application is updating, and the wait-for graph gains edges that wouldn&amp;rsquo;t exist during a normal workload. This rarely manifests as a hard deadlock (the tools are written defensively) but it does manifest as elevated deadlock rates during the migration window.&lt;/p&gt;
&lt;p&gt;None of these are properties of the queries themselves. They&amp;rsquo;re properties of how work gets distributed across workers, processes, or replicas, which means they&amp;rsquo;re invisible to query-level review and only surface when the deadlock counter is watched over time. The primitives for fixing them (&lt;code&gt;SKIP LOCKED&lt;/code&gt;, &lt;code&gt;NOWAIT&lt;/code&gt;, advisory locks, DDL timeouts) are covered in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#nowait-and-skip-locked-as-prevention-primitives" &gt;Part 2: NOWAIT and SKIP LOCKED&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="engine-level-differences-that-shape-the-patterns"&gt;Engine-level differences that shape the patterns
&lt;/h2&gt;&lt;p&gt;The same pattern can deadlock on one engine and not the other. These differences are pattern-shaping; they change which of the above sections apply to your workload. Operational tuning (detector cost, wait timeouts, monitoring) is covered in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#monitoring-that-actually-catches-regressions" &gt;Part 2: Monitoring&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Default isolation.&lt;/strong&gt; PostgreSQL defaults to &lt;code&gt;READ COMMITTED&lt;/code&gt;. MySQL defaults to &lt;code&gt;REPEATABLE READ&lt;/code&gt; (with gap locks). The same application code has measurably different deadlock rates between the two because of this alone, before any other tuning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gap locks.&lt;/strong&gt; Only InnoDB has them, and only under &lt;code&gt;REPEATABLE READ&lt;/code&gt; (plus the foreign-key and duplicate-key exceptions that retain gap locking even under &lt;code&gt;READ COMMITTED&lt;/code&gt;). PostgreSQL prevents phantom reads through MVCC at its own &lt;code&gt;REPEATABLE READ&lt;/code&gt; (stricter than the SQL standard requires) without a range-locking mechanism, so the entire gap-lock deadlock category doesn&amp;rsquo;t exist on PG at any isolation level.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lock granularity.&lt;/strong&gt; PostgreSQL takes row-level (tuple) locks; InnoDB takes record locks on index entries (with next-key extension under &lt;code&gt;REPEATABLE READ&lt;/code&gt;). The practical consequence is that InnoDB locks are more entangled with index choice than PostgreSQL&amp;rsquo;s; changing which index a query uses can change which rows and gaps get locked.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FK lock style.&lt;/strong&gt; MySQL&amp;rsquo;s FK check holds a shared lock on the referenced row (next-key under &lt;code&gt;REPEATABLE READ&lt;/code&gt;, and the docs list FK checking as one of the places gap locks persist even under &lt;code&gt;READ COMMITTED&lt;/code&gt;). PostgreSQL takes a &lt;code&gt;FOR KEY SHARE&lt;/code&gt; lock (added in 9.3 specifically to reduce FK lock contention vs. the older &lt;code&gt;FOR SHARE&lt;/code&gt;). Hot parent rows are more contended under MySQL as a result.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Row-lock visibility.&lt;/strong&gt; PostgreSQL row-level locks don&amp;rsquo;t show up in &lt;code&gt;pg_locks&lt;/code&gt;. Per the docs, they&amp;rsquo;re stored on the tuple header on disk, not in shared memory. A process waiting for a row lock usually appears in &lt;code&gt;pg_locks&lt;/code&gt; as waiting for the holder&amp;rsquo;s transaction ID, not the row. InnoDB&amp;rsquo;s &lt;code&gt;performance_schema.data_locks&lt;/code&gt; exposes row-level lock state directly. More on this in Part 2.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither engine is &amp;ldquo;better.&amp;rdquo; The behaviors are different, and code that assumes one can deadlock mysteriously when moved to the other.&lt;/p&gt;
&lt;h2 id="why-schema-reading-assistants-hit-these-patterns"&gt;Why schema-reading assistants hit these patterns
&lt;/h2&gt;&lt;p&gt;Locking behavior has no syntax in the query text. A &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; advertises the intent; a plain &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; or &lt;code&gt;INSERT ... ON CONFLICT&lt;/code&gt; doesn&amp;rsquo;t. The shared next-key lock on a duplicate-key violation, the FK shared lock on the parent row, the gap-lock extension under MySQL&amp;rsquo;s default isolation are all implementation details of the storage engine. Schema-reading assistants read the catalog, which describes tables, columns, and constraints, and the codebase, which describes queries. Neither surfaces lock ordering, gap-lock semantics, or the difference between &lt;code&gt;READ COMMITTED&lt;/code&gt; and &lt;code&gt;REPEATABLE READ&lt;/code&gt; unless the prompt specifically includes them.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why AI-generated UPSERT and batch-insert code deadlocks in production the way it does. The model reads &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; as an atomic upsert, not as &amp;ldquo;takes a shared next-key lock, possibly including a gap, before raising the duplicate-key error that the application will retry.&amp;rdquo; It generates batch INSERTs that process rows in whatever order the application supplies, not sorted by key: fine for a single writer, a lock-ordering cycle under any realistic concurrency. The patterns above are the ones the catalog and query text can&amp;rsquo;t warn about, and they&amp;rsquo;re the ones that arrive in production as &amp;ldquo;intermittent deadlocks under load&amp;rdquo; after passing every test that didn&amp;rsquo;t include a second concurrent worker. The fix lives one level up from the query (sorted batches, explicit lock ordering, retry loops) which is the subject of Part 2.&lt;/p&gt;
&lt;h2 id="whats-in-part-2"&gt;What&amp;rsquo;s in Part 2
&lt;/h2&gt;&lt;p&gt;The patterns are the first half. Turning them into working systems takes a different set of skills: reading the deadlock log to identify which pattern is firing, building retry logic that doesn&amp;rsquo;t mask the real bug, isolating hot rows before they become incident reports, and choosing the right tool for each (&lt;code&gt;NOWAIT&lt;/code&gt;, &lt;code&gt;SKIP LOCKED&lt;/code&gt;, advisory locks, counter tables, or the isolation-level change that eliminates the category entirely). PostgreSQL&amp;rsquo;s &lt;code&gt;SERIALIZABLE&lt;/code&gt;/SSI produces serialization failures that look like deadlocks but aren&amp;rsquo;t; the difference matters for retry architecture. &lt;code&gt;AUTO_INCREMENT&lt;/code&gt; and sequence-related locking have their own failure modes. DDL migrations on both engines introduce lock queues that manifest as deadlock-like incidents.&lt;/p&gt;
&lt;p&gt;All of that is in &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/" &gt;Database Deadlocks, Part 2: Diagnosis, Retries, and Prevention&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="mental-model-for-the-patterns"&gt;Mental model for the patterns
&lt;/h2&gt;&lt;p&gt;Deadlocks are what consistent concurrency control does when two transactions make the engine choose between them. The database isn&amp;rsquo;t misbehaving; it&amp;rsquo;s refusing to let both of two contradictory orderings win. The error in the application log is a notification, not a fault.&lt;/p&gt;
&lt;p&gt;That makes the diagnostic question concrete. Which pattern is firing? Every deadlock in production fits one of the shapes above: lock-ordering cycle, gap lock on a range, duplicate-key shared lock, FK shared lock on a hot parent, unindexed predicate lock amplification, worker-pool race, or replication-apply cycle. Identifying the pattern from the deadlock log narrows the fix enormously. &amp;ldquo;Two transactions deadlocked, retry the transaction&amp;rdquo; is true but useless. &amp;ldquo;Two workers took locks on the same jobs queue in different orders, switch to &lt;code&gt;SKIP LOCKED&lt;/code&gt;&amp;rdquo; is a fix. The work is in the identification.&lt;/p&gt;</description></item><item><title>NULL in SQL: Three-Valued Logic and the Silent Bug Factory</title><link>https://explainanalyze.com/p/null-in-sql-three-valued-logic-and-the-silent-bug-factory/</link><pubDate>Sun, 26 Jan 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/null-in-sql-three-valued-logic-and-the-silent-bug-factory/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post NULL in SQL: Three-Valued Logic and the Silent Bug Factory" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;NULL is the absence of a value, and SQL evaluates expressions involving it under three-valued logic (TRUE / FALSE / UNKNOWN). Most operators return UNKNOWN when one of their operands is NULL, so rows with NULLs silently drop out of &lt;code&gt;!=&lt;/code&gt;, &lt;code&gt;IN&lt;/code&gt;, and &lt;code&gt;NOT IN&lt;/code&gt; filters and behave inconsistently across &lt;code&gt;JOIN&lt;/code&gt;, &lt;code&gt;GROUP BY&lt;/code&gt;, &lt;code&gt;DISTINCT&lt;/code&gt;, and aggregate functions. The rules are consistent if you know them, and a source of silently wrong results when you don&amp;rsquo;t.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There&amp;rsquo;s a category of SQL bug that shows up in almost every mature codebase. Someone writes a filter like &lt;code&gt;WHERE status != 'closed'&lt;/code&gt;, expecting it to return every row that isn&amp;rsquo;t closed. Instead it returns fewer rows than the raw table contains. The rows where &lt;code&gt;status&lt;/code&gt; is NULL silently dropped out. No error. No warning. The query is doing exactly what the SQL standard says it should, and the result is still wrong for what the author meant.&lt;/p&gt;
&lt;p&gt;NULL handling is the single most common source of silently wrong query results in relational databases. The behavior is consistent if you know the rules, but the rules don&amp;rsquo;t match the intuition most programming languages build. In Java or Python, &lt;code&gt;null != &amp;quot;closed&amp;quot;&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;. In SQL, it&amp;rsquo;s UNKNOWN, and UNKNOWN rows get filtered out. That one difference produces most of the bugs.&lt;/p&gt;
&lt;h2 id="null-is-not-a-value"&gt;NULL is not a value
&lt;/h2&gt;&lt;p&gt;Every introduction to SQL NULL starts here because it has to. NULL is the absence of a value, a marker that says &amp;ldquo;this column has no data.&amp;rdquo; It&amp;rsquo;s not zero, not empty string, not false. It doesn&amp;rsquo;t equal itself. It doesn&amp;rsquo;t not-equal itself either. Any comparison involving NULL returns a third logical state: UNKNOWN.&lt;/p&gt;
&lt;p&gt;This is called three-valued logic (3VL), and SQL uses it consistently throughout the language. The three values are TRUE, FALSE, and UNKNOWN. Most operators propagate UNKNOWN: any arithmetic, string, or comparison operation with a NULL operand returns NULL (or UNKNOWN, in a boolean context).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL (not TRUE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hello&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL in PostgreSQL (ANSI standard behavior)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hello&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- NULL in MySQL, &amp;#39;hello&amp;#39; in PostgreSQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;CONCAT&lt;/code&gt; difference is a good example of how engines diverge even within well-defined territory. MySQL&amp;rsquo;s &lt;code&gt;CONCAT&lt;/code&gt; propagates NULL: any NULL argument makes the whole result NULL. PostgreSQL&amp;rsquo;s &lt;code&gt;CONCAT&lt;/code&gt; function does the opposite, silently skipping NULL arguments and returning the concatenation of the non-NULL parts. (PostgreSQL&amp;rsquo;s &lt;code&gt;||&lt;/code&gt; operator still propagates NULL, matching ANSI.) Two queries that look identical can return different results on different engines, and the difference only shows up when a NULL appears.&lt;/p&gt;
&lt;p&gt;For NULL-skipping concatenation that behaves the same on both engines, use &lt;code&gt;CONCAT_WS&lt;/code&gt; (concat with separator). Both MySQL and PostgreSQL skip NULL arguments with it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CONCAT_WS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;middle_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- &amp;#34;Alice Smith&amp;#34; even if middle_name IS NULL, on both engines.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;One MySQL-specific gotcha: if the separator itself is NULL, the whole result is NULL. The separator is the one argument &lt;code&gt;CONCAT_WS&lt;/code&gt; still propagates NULL from. As long as the separator is a literal string, the function is a reliable NULL-safe concat across engines.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;IS NULL, not = NULL&lt;/strong&gt;
 &lt;div&gt;The only way to test for NULL is with &lt;code&gt;IS NULL&lt;/code&gt; or &lt;code&gt;IS NOT NULL&lt;/code&gt;. &lt;code&gt;WHERE col = NULL&lt;/code&gt; always returns zero rows, because &lt;code&gt;col = NULL&lt;/code&gt; evaluates to NULL, which is not TRUE, so the row is filtered out. This is one of those mistakes every SQL engineer makes exactly once.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="where-clauses-filter-out-unknown"&gt;WHERE clauses filter out UNKNOWN
&lt;/h2&gt;&lt;p&gt;The rule that drives most NULL bugs: &lt;code&gt;WHERE&lt;/code&gt; only keeps rows where the condition evaluates to TRUE. UNKNOWN rows are filtered out, same as FALSE rows.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- &amp;#34;Users not on the sales team&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If &lt;code&gt;team_id&lt;/code&gt; is NULL for unassigned users (a completely normal state) those rows are silently dropped. The expression &lt;code&gt;NULL != 3&lt;/code&gt; evaluates to UNKNOWN, and UNKNOWN is not TRUE, so the row doesn&amp;rsquo;t survive the filter.&lt;/p&gt;
&lt;p&gt;The mental model most developers carry from application code (&amp;ldquo;anything that isn&amp;rsquo;t team 3 is included&amp;rdquo;) is wrong in SQL. To get that behavior, you have to spell it out:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;OR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is one of the most common sources of &amp;ldquo;the numbers don&amp;rsquo;t match&amp;rdquo; bugs. A report that&amp;rsquo;s supposed to count &amp;ldquo;everyone outside the sales team&amp;rdquo; quietly excludes every unassigned user, and the total looks plausible because unassigned users aren&amp;rsquo;t visible in the team-level breakdown either. The discrepancy only surfaces when someone reconciles against a direct row count.&lt;/p&gt;
&lt;h2 id="not-in-is-a-trap"&gt;NOT IN is a trap
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;NOT IN&lt;/code&gt; with a nullable subquery is the classic silent-failure NULL bug. The trap is specifically that the subquery has to return a column that can contain NULL: rules out primary keys but extremely common for foreign keys, self-references, and any column that&amp;rsquo;s optional by design.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- &amp;#34;Find users who aren&amp;#39;t anybody&amp;#39;s manager.&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The subquery returns every &lt;code&gt;manager_id&lt;/code&gt; in the table, including NULL for users who don&amp;rsquo;t have a manager (the CEO, top-level roles, anyone unassigned). The moment the subquery contains a single NULL, the outer query returns zero rows.&lt;/p&gt;
&lt;p&gt;The reason is how &lt;code&gt;NOT IN&lt;/code&gt; expands. &lt;code&gt;x NOT IN (a, b, c)&lt;/code&gt; is equivalent to &lt;code&gt;x != a AND x != b AND x != c&lt;/code&gt;. If any of &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, &lt;code&gt;c&lt;/code&gt; is NULL, that comparison returns UNKNOWN, and AND with UNKNOWN can only ever be FALSE or UNKNOWN. The row never passes the filter.&lt;/p&gt;
&lt;p&gt;Safer alternatives:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Use NOT EXISTS - handles NULLs correctly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;EXISTS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Or filter NULLs out of the subquery explicitly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;NOT EXISTS&lt;/code&gt; is the better habit. It&amp;rsquo;s correct regardless of NULL presence, and the query planner handles it at least as well as &lt;code&gt;NOT IN&lt;/code&gt; on any modern engine. Treating &lt;code&gt;NOT IN&lt;/code&gt; as &amp;ldquo;suspicious until proven NULL-free&amp;rdquo; saves a category of bug that&amp;rsquo;s almost impossible to catch in review.&lt;/p&gt;
&lt;h2 id="count-and-null-skipped-not-zero"&gt;COUNT and NULL: skipped, not zero
&lt;/h2&gt;&lt;p&gt;The single most important thing to know about aggregates and NULL: NULL is not treated as zero. It&amp;rsquo;s skipped entirely. Nothing about NULL gets coerced or counted; it&amp;rsquo;s as if the row weren&amp;rsquo;t there for the purposes of the aggregate.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;COUNT&lt;/code&gt; makes this visible because it has three forms that behave differently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;COUNT(*)&lt;/code&gt; counts &lt;strong&gt;rows&lt;/strong&gt;, regardless of their contents. NULLs in the row don&amp;rsquo;t matter.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;COUNT(col)&lt;/code&gt; counts &lt;strong&gt;non-NULL values&lt;/strong&gt; of &lt;code&gt;col&lt;/code&gt;. A row where &lt;code&gt;col IS NULL&lt;/code&gt; is skipped.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;COUNT(DISTINCT col)&lt;/code&gt; counts &lt;strong&gt;distinct non-NULL values&lt;/strong&gt;. NULL is not treated as a distinct value; it&amp;rsquo;s excluded.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rows_with_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;distinct_emails&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;On a table of 1,000 users where 200 have NULL emails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;COUNT(*)&lt;/code&gt; returns 1000 (all rows)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;COUNT(email)&lt;/code&gt; returns 800 (NULLs skipped)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;COUNT(DISTINCT email)&lt;/code&gt; returns ≤ 800 (distinct non-NULL emails only)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows up in reports all the time. &amp;ldquo;How many users signed up this month?&amp;rdquo; gets answered with &lt;code&gt;COUNT(signup_source)&lt;/code&gt; and comes up short because the column was added later and older rows have NULL. The row is there. &lt;code&gt;COUNT(*)&lt;/code&gt; would see it. &lt;code&gt;COUNT(signup_source)&lt;/code&gt; doesn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;The rule: use &lt;code&gt;COUNT(*)&lt;/code&gt; when you want rows, &lt;code&gt;COUNT(col)&lt;/code&gt; when you specifically want &amp;ldquo;rows with that column populated.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="sum-avg-min-max-also-skip-null"&gt;SUM, AVG, MIN, MAX: also skip NULL
&lt;/h2&gt;&lt;p&gt;The same rule holds for every aggregate. NULL is not contributed to the sum, not counted in the denominator for the average, not considered for min or max.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If half the rows have NULL rating:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SUM(rating)&lt;/code&gt; is the sum of the non-NULL half. NULLs don&amp;rsquo;t contribute 0; they contribute nothing.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AVG(rating)&lt;/code&gt; is the sum of the non-NULL half divided by the &lt;strong&gt;count of non-NULL rows&lt;/strong&gt;, not the total row count.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The AVG behavior is the most common source of surprise. If 10,000 rows have &lt;code&gt;rating = 5&lt;/code&gt; and 10,000 have &lt;code&gt;rating = NULL&lt;/code&gt;, &lt;code&gt;AVG(rating)&lt;/code&gt; is 5.0, not 2.5. The NULL rows don&amp;rsquo;t pull the average down toward zero. They&amp;rsquo;re not in the denominator at all.&lt;/p&gt;
&lt;p&gt;If you want NULL-as-zero behavior, you have to opt in:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Now NULLs become 0 and land in both the sum and the denominator.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Returns 2.5 in the example above.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="note-box"&gt;
 &lt;strong&gt;SUM of all NULLs is NULL, not zero&lt;/strong&gt;
 &lt;div&gt;&lt;code&gt;SUM(col)&lt;/code&gt; over a set where every value is NULL returns NULL, not 0. A &lt;code&gt;SUM&lt;/code&gt; that feeds into arithmetic downstream (&lt;code&gt;total + tax&lt;/code&gt;, for example) can propagate NULL through the rest of the expression, often somewhere the query author wasn&amp;rsquo;t expecting. &lt;code&gt;COALESCE(SUM(col), 0)&lt;/code&gt; is the idiomatic fix; make the fallback explicit at the aggregate.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The framing that keeps this straight: NULL is not a value, so aggregates have nothing to aggregate. Absent, not zero. If you want absent to mean zero, that&amp;rsquo;s a &lt;code&gt;COALESCE&lt;/code&gt; decision the query author makes; the engine won&amp;rsquo;t make it for you.&lt;/p&gt;
&lt;h2 id="group-by-and-distinct-treat-nulls-as-equal"&gt;GROUP BY and DISTINCT treat NULLs as equal
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s where the rules get inconsistent in a way that genuinely surprises people: &lt;code&gt;GROUP BY&lt;/code&gt; and &lt;code&gt;DISTINCT&lt;/code&gt; treat all NULLs as the same group, even though &lt;code&gt;NULL = NULL&lt;/code&gt; returns UNKNOWN.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- All rows where team_id is NULL land in one group, as if they were equal.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- team_id | count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- NULL | 200
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- 1 | 500
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- 2 | 300
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- DISTINCT collapses all NULLs into one row.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- NULL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is a deliberate exception carved out by the SQL standard. &lt;code&gt;GROUP BY&lt;/code&gt; and &lt;code&gt;DISTINCT&lt;/code&gt; use a &amp;ldquo;NULL-safe&amp;rdquo; equality for grouping purposes, because the alternative (one group per NULL row) would be useless. But it means the behavior is internally inconsistent: &lt;code&gt;WHERE a = b&lt;/code&gt; says NULLs aren&amp;rsquo;t equal, &lt;code&gt;GROUP BY a&lt;/code&gt; says they are.&lt;/p&gt;
&lt;p&gt;The practical implication: &lt;code&gt;COUNT(DISTINCT col)&lt;/code&gt; excludes NULL entirely (consistent with &lt;code&gt;COUNT(col)&lt;/code&gt;), while &lt;code&gt;GROUP BY col&lt;/code&gt; produces a single row for all NULLs. Two different &amp;ldquo;null-handling&amp;rdquo; behaviors under the same umbrella of &amp;ldquo;treats NULLs as equal for grouping.&amp;rdquo; Queries that rely on either for correctness should be written with the awareness that the two operations don&amp;rsquo;t agree.&lt;/p&gt;
&lt;h2 id="null-safe-comparison-operators"&gt;NULL-safe comparison operators
&lt;/h2&gt;&lt;p&gt;Both MySQL and PostgreSQL offer operators that treat NULL as equal to NULL, mirroring the &lt;code&gt;GROUP BY&lt;/code&gt; behavior for regular comparisons.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Matches rows where email IS NULL. &amp;lt;=&amp;gt; is the null-safe equal operator.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL (ANSI SQL)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Same idea. Treats NULLs as equal to each other.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These are useful when joining or filtering on columns that may contain NULL on both sides and you want NULLs to match:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Standard equality misses NULL-to-NULL matches
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- IS NOT DISTINCT FROM treats NULLs as matching
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Neither is used often in practice. The habit most teams settle on is &amp;ldquo;don&amp;rsquo;t let NULL be meaningful in join columns&amp;rdquo;: either constrain the columns &lt;code&gt;NOT NULL&lt;/code&gt; or filter NULLs out before joining. The operators are there for the cases where those aren&amp;rsquo;t options.&lt;/p&gt;
&lt;h2 id="order-by-null-placement-varies-by-engine"&gt;ORDER BY: NULL placement varies by engine
&lt;/h2&gt;&lt;p&gt;When sorting, NULL has to go somewhere. The SQL standard leaves the default placement implementation-defined, and engines disagree.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PostgreSQL.&lt;/strong&gt; NULLs sort last for &lt;code&gt;ASC&lt;/code&gt; and first for &lt;code&gt;DESC&lt;/code&gt; by default.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MySQL.&lt;/strong&gt; NULLs sort first for &lt;code&gt;ASC&lt;/code&gt; and last for &lt;code&gt;DESC&lt;/code&gt; by default.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Oracle and SQL Server.&lt;/strong&gt; Match PostgreSQL&amp;rsquo;s behavior (NULLs last for &lt;code&gt;ASC&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fix is to be explicit:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NULLS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NULLS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;NULLS FIRST&lt;/code&gt; / &lt;code&gt;NULLS LAST&lt;/code&gt; is ANSI standard and supported by PostgreSQL, Oracle, and SQL Server. MySQL doesn&amp;rsquo;t support the &lt;code&gt;NULLS FIRST/LAST&lt;/code&gt; syntax directly; you fake it with a computed column:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL idiom for &amp;#34;NULLS LAST&amp;#34; on an ASC sort
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- event_time IS NULL returns 0 for non-nulls, 1 for nulls; 0 sorts first.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Teams that run the same reports against different engines (especially during a migration or in a polyglot analytics stack) hit this one hard. A top-10 leaderboard quietly reorders when the ORDER BY engine changes underneath it.&lt;/p&gt;
&lt;h2 id="joins-dont-match-on-null"&gt;JOINs don&amp;rsquo;t match on NULL
&lt;/h2&gt;&lt;p&gt;A standard equi-join &lt;code&gt;a.col = b.col&lt;/code&gt; doesn&amp;rsquo;t match rows where either side is NULL. This is consistent with the three-valued logic rule: &lt;code&gt;NULL = NULL&lt;/code&gt; is UNKNOWN, so the join predicate fails.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Users can have no manager (manager_id IS NULL).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- This join drops any user with no manager.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_name&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;managers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the intent is &amp;ldquo;every user, with manager info if present,&amp;rdquo; use a LEFT JOIN. If the intent is &amp;ldquo;users where manager_id matches some manager row,&amp;rdquo; the INNER JOIN is correct but it&amp;rsquo;s worth naming the exclusion: users with NULL &lt;code&gt;manager_id&lt;/code&gt; are gone, on purpose.&lt;/p&gt;
&lt;p&gt;For joins that should treat NULLs as matching (both sides have NULL, and that means &amp;ldquo;same&amp;rdquo;), use the null-safe operator:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;external_ref&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;external_ref&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is rare but legitimate (e.g., matching optional identifiers where &amp;ldquo;both unspecified&amp;rdquo; should be treated as a match). Most of the time, the correct answer is to make the column &lt;code&gt;NOT NULL&lt;/code&gt; and use a sentinel if needed (and then deal with the sentinel&amp;rsquo;s own problems, covered below).&lt;/p&gt;
&lt;h2 id="foreign-keys-are-nullable-by-default"&gt;Foreign keys are nullable by default
&lt;/h2&gt;&lt;p&gt;A foreign key column is nullable unless declared &lt;code&gt;NOT NULL&lt;/code&gt;. A nullable FK means the reference is optional: users may or may not have a manager, orders may or may not be linked to a promotion. This is often the correct intent, but it&amp;rsquo;s frequently unintentional.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- manager_id is nullable by default. This is intentional if users can be unmanaged.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;REFERENCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;managers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Review migration files with this in mind. A column that should always be populated but was added as nullable will accept NULLs forever. Retrofitting &lt;code&gt;NOT NULL&lt;/code&gt; later requires backfilling or cleaning up existing NULL rows: easy when the table is small, painful at scale. (&lt;a class="link" href="https://explainanalyze.com/p/foreign-keys-are-not-optional/#what-foreign-keys-actually-give-you" &gt;Foreign Keys Are Not Optional&lt;/a&gt; covers the broader picture of FK enforcement and why application-level validation is an incomplete substitute.)&lt;/p&gt;
&lt;h2 id="what-null-actually-means-is-context-dependent"&gt;What NULL actually means is context-dependent
&lt;/h2&gt;&lt;p&gt;The SQL rules for NULL are unambiguous. What NULL means in a given column is not. NULL can mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unknown.&lt;/strong&gt; The data exists but we don&amp;rsquo;t have it. A user&amp;rsquo;s birthdate where the user declined to share.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Not applicable.&lt;/strong&gt; The field doesn&amp;rsquo;t make sense for this row. &lt;code&gt;spouse_name&lt;/code&gt; on a row for a single person.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ongoing or not yet set.&lt;/strong&gt; The state isn&amp;rsquo;t finalized. &lt;code&gt;end_date&lt;/code&gt; on an active subscription.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data entry error.&lt;/strong&gt; The column should have been populated but wasn&amp;rsquo;t.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Legacy.&lt;/strong&gt; The column was added after the row was created and never backfilled.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The same column may mean different things in different rows, and the schema doesn&amp;rsquo;t tell you which is which. This is where &lt;a class="link" href="https://explainanalyze.com/p/comment-your-schema/#what-to-comment" &gt;schema comments&lt;/a&gt; earn their keep, documenting the semantics of NULL in each column in the DDL itself rather than in a wiki page nobody finds.&lt;/p&gt;
&lt;h2 id="sentinel-values-the-alternative-and-its-own-problems"&gt;Sentinel values: the alternative, and its own problems
&lt;/h2&gt;&lt;p&gt;A common workaround: use a sentinel value instead of NULL. &lt;code&gt;end_date = '9999-12-31'&lt;/code&gt; for &amp;ldquo;ongoing.&amp;rdquo; &lt;code&gt;status = -1&lt;/code&gt; for &amp;ldquo;unknown.&amp;rdquo; &lt;code&gt;deleted_at = '1970-01-01'&lt;/code&gt; for &amp;ldquo;not deleted.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Sentinels avoid the three-valued-logic rules at the cost of introducing their own bugs. A few to watch for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Aggregates include sentinels.&lt;/strong&gt; &lt;code&gt;AVG(rating)&lt;/code&gt; over a column where &amp;ldquo;unknown&amp;rdquo; is stored as &lt;code&gt;-1&lt;/code&gt; skews the average toward negative. Sentinels break the &amp;ldquo;aggregates skip missing values&amp;rdquo; assumption that NULL provides for free.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Range queries break in unexpected directions.&lt;/strong&gt; &lt;code&gt;WHERE end_date &amp;gt; NOW()&lt;/code&gt; returns all the sentinel rows along with real future dates. Every filter has to explicitly exclude the sentinel.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Indexes skew.&lt;/strong&gt; A column where 80% of the values are the sentinel has a low-selectivity index. The planner may skip the index entirely on queries that filter out the sentinel, because it doesn&amp;rsquo;t know that&amp;rsquo;s the intent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Downstream consumers have to know.&lt;/strong&gt; Every system that reads the data has to treat &lt;code&gt;9999-12-31&lt;/code&gt; specially. Miss one consumer and wrong data shows up in a report.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The trade-off is real. NULL forces every query author to think about three-valued logic. Sentinels let queries use normal equality but require every author to know the sentinel. Neither is free; they move the cost around.&lt;/p&gt;
&lt;p&gt;The pragmatic middle ground: use NULL for genuinely absent data (ongoing subscriptions, optional fields), use sentinels sparingly and document them, and declare &lt;code&gt;NOT NULL&lt;/code&gt; everywhere you can enforce presence. A column that&amp;rsquo;s &lt;code&gt;NOT NULL&lt;/code&gt; is the one case where the rules don&amp;rsquo;t matter, because NULL can&amp;rsquo;t get in.&lt;/p&gt;
&lt;h2 id="diagnosing-a-null-bug"&gt;Diagnosing a NULL bug
&lt;/h2&gt;&lt;p&gt;When a query returns fewer (or more, or none) of the rows it should, the fastest way to narrow it down to a NULL issue:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Are there NULLs in the columns referenced by the filter?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no_team&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If &lt;code&gt;no_team&lt;/code&gt; is non-zero and the filter is &lt;code&gt;team_id != X&lt;/code&gt; or &lt;code&gt;team_id IN (...)&lt;/code&gt;, the NULL rows are the likely culprit. Rewriting with explicit NULL handling (&lt;code&gt;team_id != X OR team_id IS NULL&lt;/code&gt;, or &lt;code&gt;NOT EXISTS&lt;/code&gt;, or &lt;code&gt;COALESCE(team_id, -1) != X&lt;/code&gt;) will reveal whether NULLs were being silently excluded.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;NOT IN&lt;/code&gt;, inspect the subquery:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Does the NOT IN subquery contain NULL?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the answer is non-zero, &lt;code&gt;NOT IN&lt;/code&gt; is returning an empty set regardless of the outer query&amp;rsquo;s data.&lt;/p&gt;
&lt;h2 id="the-mental-model"&gt;The mental model
&lt;/h2&gt;&lt;p&gt;NULL handling is consistent once you internalize the rule set, and the rule set is smaller than it looks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Any comparison involving NULL returns UNKNOWN. &lt;code&gt;WHERE&lt;/code&gt; filters out UNKNOWN rows.&lt;/li&gt;
&lt;li&gt;Aggregates skip NULLs. &lt;code&gt;COUNT(*)&lt;/code&gt; doesn&amp;rsquo;t. &lt;code&gt;COUNT(col)&lt;/code&gt; does.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GROUP BY&lt;/code&gt;, &lt;code&gt;DISTINCT&lt;/code&gt;, and &lt;code&gt;ORDER BY&lt;/code&gt; treat all NULLs as equivalent (with engine-specific sort placement).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;NOT IN&lt;/code&gt; with a nullable subquery returns empty. Use &lt;code&gt;NOT EXISTS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Join predicates don&amp;rsquo;t match NULLs unless you use &lt;code&gt;IS NOT DISTINCT FROM&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Past that, most NULL bugs are prevented by one habit: declare &lt;code&gt;NOT NULL&lt;/code&gt; wherever the column should actually be populated. Every &lt;code&gt;NOT NULL&lt;/code&gt; column is a column where none of these rules matter, because there&amp;rsquo;s nothing for them to misbehave on. The fewer nullable columns the schema has, the less of this there is to think about.&lt;/p&gt;
&lt;p&gt;The columns where NULL genuinely carries meaning (optional references, ongoing states, data that may not exist) are the ones worth documenting. A schema comment that says &amp;ldquo;NULL means the subscription is still active&amp;rdquo; pulls the NULL semantics into the DDL itself, where it&amp;rsquo;s visible to every engineer, every tool, and every query author who wasn&amp;rsquo;t around when the decision was made.&lt;/p&gt;</description></item><item><title>Joins That Lie: The Cardinality Problem</title><link>https://explainanalyze.com/p/joins-that-lie-the-cardinality-problem/</link><pubDate>Thu, 09 Jan 2025 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/joins-that-lie-the-cardinality-problem/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Joins That Lie: The Cardinality Problem" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;Most silently wrong SQL comes from the same root cause: a join that multiplies rows in a way the author didn&amp;rsquo;t expect. Aggregations built on those rows (&lt;code&gt;SUM&lt;/code&gt;, &lt;code&gt;COUNT&lt;/code&gt;, &lt;code&gt;AVG&lt;/code&gt;) inflate without producing any error. The fix is understanding the cardinality of every join before writing the aggregation, not more careful SQL.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There&amp;rsquo;s a category of SQL bug that never throws an error, never fails code review, and never shows up in tests. The query runs. The results look reasonable. Someone ships a dashboard, and a month later finance asks why revenue is 40% higher than what the billing system reports. That 40% isn&amp;rsquo;t a bug in the data; it&amp;rsquo;s a join that multiplied rows, and a &lt;code&gt;SUM&lt;/code&gt; that dutifully added them all up.&lt;/p&gt;
&lt;p&gt;The tricky part is that structurally the query is fine. The joins are valid. The filters are valid. The aggregation is valid. Every individual piece is correct. The cardinality of the relationships (how many child rows exist per parent, and how that changes when multiple child tables are joined at once) is doing damage the query never surfaces.&lt;/p&gt;
&lt;h2 id="cardinality-briefly"&gt;Cardinality, briefly
&lt;/h2&gt;&lt;p&gt;Cardinality describes the number of rows on each side of a relationship:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;One-to-one (1:1).&lt;/strong&gt; Each row in table A matches at most one row in table B. Less common than 1:N, but legitimately used for optional extensions (splitting off rarely-accessed or sensitive columns into a side table), inheritance patterns (a base table with specialized sub-tables), or separating hot and cold data for caching and storage reasons. A 1:1 join preserves row count.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;One-to-many (1:N).&lt;/strong&gt; Each row in A matches zero or more rows in B. The common case: one order has many order items, one user has many sessions, one post has many comments. Joining A to B duplicates the parent row once per matching child. If a parent has zero children, an inner join drops it entirely; a left join keeps it with NULLs on the child side. This difference matters and it&amp;rsquo;s the source of another whole class of silent bugs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Many-to-many (N:M).&lt;/strong&gt; Rows in A match many rows in B and vice versa. Always implemented through a bridge table (junction table) that sits between them. A bridge is two 1:N relationships back-to-back: the bridge table holds a foreign key to A and a foreign key to B, with each row pairing one A with one B. A has many bridge rows, and B has many bridge rows. Joining through it multiplies by the cardinality on both sides.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The shape of the relationship determines what a join does to row counts. This is where aggregations start to lie.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;Schema cardinality vs. data cardinality&lt;/strong&gt;
 &lt;div&gt;There&amp;rsquo;s a distinction worth naming: what the schema allows vs. what the data actually contains. A foreign key from &lt;code&gt;user_profiles.user_id&lt;/code&gt; to &lt;code&gt;users.id&lt;/code&gt; with a unique constraint is 1:1 at the schema level. A column typed as 1:N by constraint can be 1:1 in practice; if every order in your system happens to have exactly one line item, the relationship is legally 1:N but effectively 1:1. This matters for query planning (the optimizer uses constraints, not observed data), index choice, and reasoning about whether a join can actually multiply rows. A query that&amp;rsquo;s safe against the current data can break as soon as the data starts exercising the cardinality the schema permits.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="the-row-multiplication-problem"&gt;The row multiplication problem
&lt;/h2&gt;&lt;p&gt;The examples in this article use a deliberately simple &lt;code&gt;customers&lt;/code&gt; / &lt;code&gt;orders&lt;/code&gt; / &lt;code&gt;order_items&lt;/code&gt; schema so the mechanics are easy to follow. In real systems the shape changes constantly: invoices and payments, subscriptions and usage events, tickets and messages, events and dimensions in a warehouse. The permutations are endless, but the underlying failure is the same: a join that multiplies rows in a way the author didn&amp;rsquo;t expect, feeding an aggregation that now lies. Once the pattern is visible in one schema, it&amp;rsquo;s visible everywhere.&lt;/p&gt;
&lt;p&gt;Consider a schema everyone has seen some version of:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;REFERENCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;REFERENCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;A question: what&amp;rsquo;s the total revenue per customer? The obvious query:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is correct. One row per order, &lt;code&gt;total_cents&lt;/code&gt; summed per customer. Now someone asks: &amp;ldquo;can we also see how many items they bought?&amp;rdquo; The change looks trivial; add a join and a count:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items_purchased&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;items_purchased&lt;/code&gt; count is correct. The &lt;code&gt;revenue&lt;/code&gt; is wrong.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s what happened. &lt;code&gt;orders&lt;/code&gt; to &lt;code&gt;order_items&lt;/code&gt; is 1:N. Joining them multiplies each order row by the number of items it contains. An order with 5 items now appears 5 times in the result set, once per item. &lt;code&gt;total_cents&lt;/code&gt;, which lives on the &lt;code&gt;orders&lt;/code&gt; row, is duplicated in each of those 5 copies.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SUM(o.total_cents)&lt;/code&gt; now sums the same order total once per item. A $100 order with 5 items contributes $500. Revenue is inflated by the average number of items per order.&lt;/p&gt;
&lt;p&gt;The query runs. The numbers look like revenue. Nothing is flagged. The dashboard ships.&lt;/p&gt;
&lt;div class="note-box"&gt;
 &lt;strong&gt;Why it&amp;#39;s easy to miss&lt;/strong&gt;
 &lt;div&gt;The inflation is proportional to the cardinality of the join, so it affects every row by roughly the same factor. Totals grow uniformly, relative rankings stay intact, and top-10 lists still look &amp;ldquo;right.&amp;rdquo; There&amp;rsquo;s nothing that stands out as obviously wrong, except the grand total doesn&amp;rsquo;t match the source system.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="the-bridge-table-trap"&gt;The bridge table trap
&lt;/h2&gt;&lt;p&gt;Many-to-many relationships make this problem worse because the multiplication happens in both directions. Take a schema with products, orders, and promotions:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_item_promotions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_item_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotion_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_item_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotion_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;An order item can have multiple promotions applied to it (a percentage discount stacked with a free shipping promo). Query: total revenue, broken down by promotion:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_item_promotions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_item_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;promotion_id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If an order item had two promotions, its &lt;code&gt;price_cents&lt;/code&gt; shows up twice (once under each promotion). Sum those up and total revenue exceeds actual revenue. Worse, if you then compare &amp;ldquo;sum across all promotions&amp;rdquo; to &amp;ldquo;total revenue from order_items,&amp;rdquo; the numbers don&amp;rsquo;t tie out, and there&amp;rsquo;s no obvious reason why.&lt;/p&gt;
&lt;p&gt;The bridge table is doing exactly what it&amp;rsquo;s supposed to do. The query is doing exactly what the SQL says. The meaning of the aggregation drifts as soon as you cross a many-to-many boundary.&lt;/p&gt;
&lt;h3 id="a-related-trap-date-filters-across-joined-tables"&gt;A related trap: date filters across joined tables
&lt;/h3&gt;&lt;p&gt;A variation of the grain problem shows up in schemas where related tables each carry their own independently-moving date column: orders vs. shipments, subscriptions vs. invoices, tickets vs. updates, orders vs. returns. When a question is time-bounded (&amp;ldquo;Q1 revenue from items shipped in Q1&amp;rdquo;), the date filter has to land on the column that matches the question. Filtering on both tables &amp;ldquo;to be safe&amp;rdquo; silently excludes rows whose dates diverge. An order placed in December with items shipping in January is a Q1 shipment; a filter on &lt;code&gt;orders.created_at&lt;/code&gt; throws it out.&lt;/p&gt;
&lt;p&gt;The rule is the same as for row multiplication: pick the grain that matches the question, once. If the question is about shipments, filter on &lt;code&gt;shipped_at&lt;/code&gt;. If it&amp;rsquo;s about orders, filter on &lt;code&gt;created_at&lt;/code&gt;. Combining both feels more rigorous and quietly returns the wrong set.&lt;/p&gt;
&lt;h2 id="how-to-diagnose-it"&gt;How to diagnose it
&lt;/h2&gt;&lt;p&gt;The symptom is always the same: a number that doesn&amp;rsquo;t match what another system says it should be. Revenue doesn&amp;rsquo;t match billing. User counts don&amp;rsquo;t match the auth service. Item totals don&amp;rsquo;t match inventory. When that happens, the first thing to check isn&amp;rsquo;t the aggregation; it&amp;rsquo;s the row count at each stage of the query.&lt;/p&gt;
&lt;p&gt;Take the aggregation off and see what you&amp;rsquo;re actually summing:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Original (wrong) query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Diagnostic: see the raw rows for one customer
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the same &lt;code&gt;order_id&lt;/code&gt; and &lt;code&gt;total_cents&lt;/code&gt; appear on multiple rows, the sum is going to double-count. Seeing the raw rows makes the multiplication obvious in a way the aggregated output never does.&lt;/p&gt;
&lt;p&gt;Another useful check: compare counts at each level independently.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Count orders directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Returns: 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Count orders through the joined query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Returns: 12
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- The 4x multiplication is the join&amp;#39;s cardinality
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;When the two numbers don&amp;rsquo;t match, the join is multiplying rows. Every aggregation downstream of that join is suspect.&lt;/p&gt;
&lt;h2 id="how-to-solve-it"&gt;How to solve it
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s no single fix; the right technique depends on whether the aggregation lives on the parent or the child, and how many cardinality boundaries you&amp;rsquo;re crossing.&lt;/p&gt;
&lt;h3 id="aggregate-at-the-correct-grain-then-join"&gt;Aggregate at the correct grain, then join
&lt;/h3&gt;&lt;p&gt;The cleanest approach is usually to do each aggregation at the table where the data actually lives, then join the pre-aggregated results together. This keeps row counts under control and makes the query&amp;rsquo;s intent obvious.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;span class="lnt"&gt;18
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WITH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;item_stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items_purchased&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items_purchased&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Revenue is summed from &lt;code&gt;orders&lt;/code&gt; where it lives, once per order. Items are counted through the &lt;code&gt;orders&lt;/code&gt;→&lt;code&gt;order_items&lt;/code&gt; join separately. Then both are joined back to &lt;code&gt;customers&lt;/code&gt;. Each aggregation happens at its correct grain, and the final join is 1:1:1, no multiplication.&lt;/p&gt;
&lt;p&gt;It looks more verbose. It is. That&amp;rsquo;s the point. The verbosity is making the cardinality explicit instead of hiding it behind a single flat join.&lt;/p&gt;
&lt;h3 id="use-distinct-inside-the-aggregate-with-caution"&gt;Use &lt;code&gt;DISTINCT&lt;/code&gt; inside the aggregate, with caution
&lt;/h3&gt;&lt;p&gt;When the multiplication is already there, &lt;code&gt;SUM(DISTINCT ...)&lt;/code&gt; can sometimes paper over it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- suspicious
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This only works if &lt;code&gt;total_cents&lt;/code&gt; is guaranteed to be unique across the duplicated rows. If two different orders happen to have the same total, &lt;code&gt;DISTINCT&lt;/code&gt; collapses them into one and revenue drops. It&amp;rsquo;s fragile: correct for the query but wrong for the data.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;COUNT(DISTINCT o.id)&lt;/code&gt; is safer because &lt;code&gt;id&lt;/code&gt; is always unique by definition. Use &lt;code&gt;DISTINCT&lt;/code&gt; on natural keys, not on aggregated values.&lt;/p&gt;
&lt;h3 id="window-functions-for-per-parent-aggregates"&gt;Window functions for &amp;ldquo;per parent&amp;rdquo; aggregates
&lt;/h3&gt;&lt;p&gt;When you need a running or per-group aggregate without collapsing rows, window functions keep the row count intact and do the math within a partition:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OVER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PARTITION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OVER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PARTITION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_total&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;JOIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;No group-by, no row collapsing, totals computed at the right grain. The cost is a result set the size of &lt;code&gt;order_items&lt;/code&gt;, so use this pattern when the row-level detail is actually needed, not as a default replacement for &lt;code&gt;GROUP BY&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="lateral-joins-and-correlated-subqueries"&gt;LATERAL joins and correlated subqueries
&lt;/h3&gt;&lt;p&gt;When you need a per-row aggregate (the total for each order, or the most recent child row) a lateral join keeps the parent&amp;rsquo;s grain and evaluates the child aggregation row by row.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: LATERAL join
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;LATERAL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;One row per order, aggregation computed inside the lateral subquery, no multiplication. This is often faster than joining and then grouping, especially when &lt;code&gt;orders&lt;/code&gt; is heavily filtered and &lt;code&gt;order_items&lt;/code&gt; is large.&lt;/p&gt;
&lt;h2 id="schema-level-defenses"&gt;Schema-level defenses
&lt;/h2&gt;&lt;p&gt;Query-level fixes only work if the person writing the query knows to apply them. Schema-level guarantees work for every query, forever.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Foreign keys&lt;/strong&gt; tell the query planner about cardinality. PostgreSQL in particular uses FK metadata to make join-order decisions and to eliminate redundant joins during planning. Beyond the integrity benefits, FKs make the shape of the data visible to both humans and the planner. (&lt;a class="link" href="https://explainanalyze.com/p/foreign-keys-are-not-optional/#the-happy-path-isnt-the-only-path" &gt;Foreign Keys Are Not Optional&lt;/a&gt; goes deeper on why skipping them compounds into silent corruption over time.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unique constraints on bridge tables&lt;/strong&gt; prevent accidental many-to-many explosions. A bridge table with &lt;code&gt;PRIMARY KEY (a_id, b_id)&lt;/code&gt; can&amp;rsquo;t contain duplicates, so joining through it can&amp;rsquo;t multiply rows because of duplicate bridge entries (only because of legitimate N:M relationships).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_item_promotions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_item_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;REFERENCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotion_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;REFERENCES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_item_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;promotion_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- prevents duplicates
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Without that composite primary key, a bug in the application layer that inserts the same &lt;code&gt;(order_item_id, promotion_id)&lt;/code&gt; pair twice would silently double revenue for that item in any query joining through the bridge. With it, the database rejects the duplicate at write time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Schema comments&lt;/strong&gt; on tables and columns document the cardinality and semantics that aren&amp;rsquo;t visible from the DDL. A line like &lt;code&gt;COMMENT ON TABLE order_item_promotions IS 'N:M bridge. One row per (item, promotion). Joining this multiplies order_item rows by avg promotions-per-item.'&lt;/code&gt; tells every future engineer exactly what the table does to row counts. (&lt;a class="link" href="https://explainanalyze.com/p/comment-your-schema/#what-schema-comments-are" &gt;Comment Your Schema&lt;/a&gt; covers the mechanics across MySQL and PostgreSQL and why this metadata layer is almost always empty.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Denormalized totals, when the trade-off is worth it.&lt;/strong&gt; For heavily queried aggregates (order totals, user balance, post comment counts), storing the aggregate on the parent table eliminates the join entirely. The write-path cost is keeping the denormalized value consistent: either through application code, triggers, or scheduled reconciliation. For high-read, low-write aggregates, the read simplicity often wins. For everything else, computing on demand is cleaner.&lt;/p&gt;
&lt;div class="warning-box"&gt;
 &lt;strong&gt;Denormalization has its own failure mode&lt;/strong&gt;
 &lt;div&gt;A stored &lt;code&gt;orders.total_cents&lt;/code&gt; that&amp;rsquo;s out of sync with &lt;code&gt;SUM(order_items.price_cents)&lt;/code&gt; is its own form of silent corruption, moved from the query layer to the write layer. Either invest in keeping it consistent (triggers, reconciliation jobs) or don&amp;rsquo;t denormalize it at all. A half-maintained denormalized aggregate is worse than no denormalization.&lt;/div&gt;
&lt;/div&gt;

&lt;h2 id="the-pause-that-schema-reading-assistants-dont-take"&gt;The pause that schema-reading assistants don&amp;rsquo;t take
&lt;/h2&gt;&lt;p&gt;A schema-reading assistant asked for &amp;ldquo;total revenue by customer&amp;rdquo; reads the catalog, finds the chain of tables it needs, writes the JOINs, adds the SUM, and hands back a query that looks right. The pause described in the section below (&amp;ldquo;wait, does this join multiply rows?&amp;rdquo;) is a step the model doesn&amp;rsquo;t take unless the prompt asks for it. The catalog tells the assistant that &lt;code&gt;customers&lt;/code&gt;, &lt;code&gt;orders&lt;/code&gt;, &lt;code&gt;order_items&lt;/code&gt;, and the &lt;code&gt;order_item_promotions&lt;/code&gt; bridge exist; it doesn&amp;rsquo;t tell it that joining through the bridge duplicates every &lt;code&gt;order_items&lt;/code&gt; row once per promotion. The inflated total and the correct one look the same on the way back.&lt;/p&gt;
&lt;p&gt;The same schema-level defenses that help humans give the model more to work with. FK metadata lets a catalog-reading tool see which joins are 1:N versus N:M. Composite primary keys on bridge tables prevent the &amp;ldquo;duplicate-in-bridge&amp;rdquo; multiplier from ever materializing in the data. Table comments that spell out cardinality (something like &lt;code&gt;'N:M bridge. Joining this multiplies order_item rows by avg promotions-per-item.'&lt;/code&gt;) put the warning in the part of the schema the assistant actually reads. This doesn&amp;rsquo;t replace the pause described below; it narrows the set of cases where the pause has to do all the work.&lt;/p&gt;
&lt;h2 id="the-mental-model"&gt;The mental model
&lt;/h2&gt;&lt;p&gt;The shortcut that prevents most of these bugs: before writing an aggregation, picture the row count at every stage of the query.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with the leftmost table. How many rows?&lt;/li&gt;
&lt;li&gt;Each join: does this multiply, preserve, or filter the row count?&lt;/li&gt;
&lt;li&gt;At the point where the aggregate runs: what is the grain of each row? What does &amp;ldquo;one row&amp;rdquo; represent?&lt;/li&gt;
&lt;li&gt;Does the aggregate make sense at that grain?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the answer is &amp;ldquo;one row represents an order item, but I&amp;rsquo;m summing an order-level field,&amp;rdquo; the bug is already obvious. When the answer is &amp;ldquo;one row represents an order, and I&amp;rsquo;m summing order totals,&amp;rdquo; the query is correct.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a skill that scales with query complexity; it&amp;rsquo;s a habit that kicks in before the query gets written. The senior engineers who never seem to hit these bugs aren&amp;rsquo;t writing smarter SQL. They&amp;rsquo;re pausing before the &lt;code&gt;SUM&lt;/code&gt; and asking what row they&amp;rsquo;re actually summing over.&lt;/p&gt;
&lt;h2 id="putting-it-together"&gt;Putting it together
&lt;/h2&gt;&lt;p&gt;Cardinality bugs are a specific kind of wrong: syntactically valid, semantically broken, and invisible to every automated check. Tests pass. Code reviews approve. Reports render. The numbers just happen to be wrong.&lt;/p&gt;
&lt;p&gt;The defense is structural, not tactical. Understand the cardinality of each relationship before writing the join. Aggregate at the grain where the data lives. Use the schema to make cardinality explicit: foreign keys, composite primary keys on bridges, comments that document the shape. When diagnosing a wrong number, strip the aggregation and look at the raw rows; the multiplication is almost always visible as soon as the &lt;code&gt;SUM&lt;/code&gt; is out of the way.&lt;/p&gt;
&lt;p&gt;The worst thing about silent bugs is that they stay silent. A crash gets fixed; wrong numbers persist for quarters. The habit of thinking about cardinality first (before writing the aggregation, not after someone flags the total) is one of the highest-leverage habits in working with relational data.&lt;/p&gt;</description></item><item><title>Uniqueness and Selectivity: The Two Numbers That Drive Query Plans</title><link>https://explainanalyze.com/p/uniqueness-and-selectivity-the-two-numbers-that-drive-query-plans/</link><pubDate>Mon, 23 Dec 2024 00:00:00 +0000</pubDate><guid>https://explainanalyze.com/p/uniqueness-and-selectivity-the-two-numbers-that-drive-query-plans/</guid><description>&lt;img src="https://explainanalyze.com/" alt="Featured image of post Uniqueness and Selectivity: The Two Numbers That Drive Query Plans" /&gt;&lt;div class="tldr-box"&gt;
 &lt;strong&gt;TL;DR&lt;/strong&gt;
 &lt;div&gt;Uniqueness governs correctness, selectivity governs performance. The interesting parts of both live in the edge cases: partial unique indexes and their UPSERT targeting quirks, the way partitioning weakens every uniqueness guarantee, correlated columns that defeat planner assumptions, stale statistics that turn a 5ms query into a 5-minute one. Declaring the constraints the planner can see and keeping its statistics fresh buys more than any amount of query rewriting.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Everyone who works with relational databases knows &lt;code&gt;UNIQUE&lt;/code&gt;. What they often don&amp;rsquo;t know is how it behaves under partitioning, how &lt;code&gt;ON CONFLICT&lt;/code&gt; targets it (and doesn&amp;rsquo;t), and what the planner actually does with it beyond rejecting duplicates. Selectivity is in the same category. The definition is trivial, but the behavior that matters lives in composite column ordering, stale statistics, and the correlated-columns problem that breaks the planner&amp;rsquo;s core assumption.&lt;/p&gt;
&lt;p&gt;This is the territory where &amp;ldquo;the query is correct&amp;rdquo; and &amp;ldquo;the query is fast&amp;rdquo; stop being the same question, and both depend on what the database can actually prove about the data. The constraints are the contract between the schema and the planner. Everything else is inference.&lt;/p&gt;
&lt;h2 id="partial-and-filtered-unique-indexes"&gt;Partial and filtered unique indexes
&lt;/h2&gt;&lt;p&gt;PostgreSQL supports partial unique indexes: uniqueness enforced only over rows matching a predicate. This is the right tool for the common real-world case &amp;ldquo;email must be unique among active users&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- PostgreSQL: email unique only among non-soft-deleted rows.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users_active_email_uniq&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deleted_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;A plain &lt;code&gt;UNIQUE (email)&lt;/code&gt; forces a choice: either allow re-registration (and lose referential integrity by reusing emails across deleted and active rows) or block it (and frustrate users whose accounts were long ago soft-deleted). The partial index lets both coexist.&lt;/p&gt;
&lt;p&gt;MySQL doesn&amp;rsquo;t support partial unique indexes directly. The workaround exploits MySQL&amp;rsquo;s treatment of NULL as distinct under &lt;code&gt;UNIQUE&lt;/code&gt; (covered in &lt;a class="link" href="https://explainanalyze.com/p/null-in-sql-three-valued-logic-and-the-silent-bug-factory/" &gt;NULL: Three-Valued Logic&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- MySQL idiom: generated column that&amp;#39;s NULL for deleted users.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COLUMN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email_active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GENERATED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ALWAYS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHEN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deleted_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;THEN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VIRTUAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users_active_email_uniq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email_active&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The constraint effectively fires only for rows where &lt;code&gt;email_active&lt;/code&gt; is non-NULL: exactly partial-index semantics, just expressed through a generated column. Awkward to write, but portable-ish and the ORMs catch on eventually.&lt;/p&gt;
&lt;h2 id="partitioned-tables-force-uniqueness-compromises"&gt;Partitioned tables force uniqueness compromises
&lt;/h2&gt;&lt;p&gt;Partitioned tables in both PostgreSQL and MySQL require the partition key to be part of every unique constraint - including the primary key. The rule exists for correctness: without the partition key in the constraint, the database would have to scan every partition on every insert to enforce uniqueness, defeating the point of partitioning.&lt;/p&gt;
&lt;p&gt;The practical consequence is that &lt;code&gt;PRIMARY KEY (id)&lt;/code&gt; isn&amp;rsquo;t allowed on a table partitioned by &lt;code&gt;created_at&lt;/code&gt;. It has to become &lt;code&gt;PRIMARY KEY (id, created_at)&lt;/code&gt;. The same applies to every other unique constraint: &lt;code&gt;UNIQUE (email)&lt;/code&gt; on a users table partitioned by region becomes &lt;code&gt;UNIQUE (email, region)&lt;/code&gt;, which quietly weakens the guarantee. The schema now allows the same email to exist in multiple regions, whether or not the application ever intended that.&lt;/p&gt;
&lt;p&gt;This is one of the sharper trade-offs in partitioning decisions. A uniqueness guarantee the schema used to provide gets weaker, and point lookups that used to be single-row &lt;code&gt;const&lt;/code&gt; accesses become &lt;code&gt;ref&lt;/code&gt; lookups because the full primary key isn&amp;rsquo;t spelled out in every query. &lt;a class="link" href="https://explainanalyze.com/p/designing-partitioning-you-dont-have-to-babysit/#what-actually-belongs-in-the-partition-key" &gt;Designing Partitioning You Don&amp;rsquo;t Have to Babysit&lt;/a&gt; covers the full picture, including why partitioning by the primary key itself (when the PK is monotonically increasing) sidesteps the trade-off entirely.&lt;/p&gt;
&lt;h2 id="upsert-targeting-is-more-specific-than-it-looks"&gt;UPSERT targeting is more specific than it looks
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;INSERT ... ON CONFLICT&lt;/code&gt; (PostgreSQL) and &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; (MySQL) bind to specific unique constraints, not to &amp;ldquo;any uniqueness that happens to apply.&amp;rdquo; The difference between the two engines is where most of the subtle bugs live.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL is explicit.&lt;/strong&gt; &lt;code&gt;ON CONFLICT (email)&lt;/code&gt; requires a unique constraint or unique index exactly matching &lt;code&gt;email&lt;/code&gt;. If none exists, the statement errors out. If a partial unique index exists instead of a plain one, &lt;code&gt;ON CONFLICT (email)&lt;/code&gt; does not match it; you need the full predicate:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Must match the partial index&amp;#39;s predicate to target it.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alice@example.com&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Alice&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deleted_at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;DO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXCLUDED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the partial index changes (predicate tightened, column added), every &lt;code&gt;ON CONFLICT&lt;/code&gt; targeting it has to change too. This is explicit coupling, but it&amp;rsquo;s coupling.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MySQL is implicit and more dangerous.&lt;/strong&gt; &lt;code&gt;ON DUPLICATE KEY UPDATE&lt;/code&gt; fires on conflict with any unique key on the table, not just the one the query author had in mind. If the table has &lt;code&gt;UNIQUE (email)&lt;/code&gt; and &lt;code&gt;UNIQUE (external_id)&lt;/code&gt;, an insert that conflicts on either key triggers the update. For rows where the inserted &lt;code&gt;email&lt;/code&gt; matches one existing row and the &lt;code&gt;external_id&lt;/code&gt; matches a different one, the behavior depends on which index is checked first and is undefined as far as the language is concerned.&lt;/p&gt;
&lt;p&gt;The practical implication: adding a new unique key to a table can silently change the semantics of every existing &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; against that table. There&amp;rsquo;s no error, no warning, just different behavior on the next conflict that falls into the new key&amp;rsquo;s path. On large schemas with dozens of unique keys, this is the UPSERT equivalent of action at a distance.&lt;/p&gt;
&lt;p&gt;The mitigation on MySQL is to prefer &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt; only when there&amp;rsquo;s a single obvious unique key, and to reach for &lt;code&gt;REPLACE&lt;/code&gt; or explicit &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; + conditional &lt;code&gt;UPDATE&lt;/code&gt;/&lt;code&gt;INSERT&lt;/code&gt; flows when the semantics need to be explicit.&lt;/p&gt;
&lt;h2 id="unique-indexes-also-concentrate-deadlock-pressure"&gt;Unique indexes also concentrate deadlock pressure
&lt;/h2&gt;&lt;p&gt;Two deadlock patterns are specific to unique indexes and show up almost nowhere else:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Duplicate-key inserts take locks even when they fail.&lt;/strong&gt; When InnoDB detects a duplicate on insert, it doesn&amp;rsquo;t just raise the error; it first acquires a shared next-key lock on the conflicting row. Under &lt;code&gt;REPEATABLE READ&lt;/code&gt; (the default), that lock covers the gap too. Two concurrent transactions inserting near the same unique key can deadlock on those shared locks before either sees the duplicate-key error. The most common production signature: a batch-upsert worker hitting the same hot row ranges from multiple threads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;ON DUPLICATE KEY UPDATE&lt;/code&gt; batches deadlock when key ordering differs.&lt;/strong&gt; Each row in a batch insert acquires its lock when the row is processed, not when the batch starts. Two batches touching overlapping keys &lt;code&gt;(A, B)&lt;/code&gt; vs &lt;code&gt;(B, A)&lt;/code&gt; take locks in opposite order and cycle. The fix is either sorting rows by unique key before the batch (so lock acquisition order is consistent across workers) or switching to &lt;code&gt;INSERT ... ON CONFLICT DO NOTHING&lt;/code&gt; plus a separate targeted &lt;code&gt;UPDATE&lt;/code&gt; pass.&lt;/p&gt;
&lt;p&gt;Neither of these shows up the same way with non-unique indexes; the uniqueness check itself is what forces the extra locking. It&amp;rsquo;s the cost of making the database enforce the guarantee, and it scales badly once the hot-key set is small and write concurrency is high. (&lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-1-the-patterns/#unique-index-deadlocks-are-a-category-of-their-own" &gt;Database Deadlocks, Part 1&lt;/a&gt; covers the broader patterns; &lt;a class="link" href="https://explainanalyze.com/p/database-deadlocks-part-2-diagnosis-retries-and-prevention/#nowait-and-skip-locked-as-prevention-primitives" &gt;Part 2&lt;/a&gt; covers reading the log, retries, and prevention.)&lt;/p&gt;
&lt;h2 id="composite-index-column-ordering"&gt;Composite index column ordering
&lt;/h2&gt;&lt;p&gt;The order of columns in a composite index is a selectivity decision that determines whether the index helps the query it was built for. The usual rules compress to three:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Equality filters before range filters.&lt;/strong&gt; An index on &lt;code&gt;(customer_id, created_at)&lt;/code&gt; is efficient for &lt;code&gt;WHERE customer_id = 42 AND created_at &amp;gt; '2026-01-01'&lt;/code&gt;. Reversed (&lt;code&gt;(created_at, customer_id)&lt;/code&gt;), the index has to scan a wide range of &lt;code&gt;created_at&lt;/code&gt; values and filter &lt;code&gt;customer_id&lt;/code&gt; as a secondary step, which is usually worse than a sequential scan.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;More selective column first for equality-only predicates.&lt;/strong&gt; For filters of the form &lt;code&gt;WHERE a = ? AND b = ?&lt;/code&gt;, the column with more distinct values goes first so the first lookup narrows more aggressively.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Match the query&amp;rsquo;s access pattern.&lt;/strong&gt; An index on &lt;code&gt;(a, b, c)&lt;/code&gt; serves queries filtering by &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;(a, b)&lt;/code&gt;, or &lt;code&gt;(a, b, c)&lt;/code&gt;. It does not serve queries filtering by &lt;code&gt;b&lt;/code&gt; alone, &lt;code&gt;c&lt;/code&gt; alone, or &lt;code&gt;(b, c)&lt;/code&gt;. The leading column is load-bearing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These interact with covering index considerations, sort order requirements, and the planner&amp;rsquo;s ability to combine multiple indexes via bitmap scans. But the starting point is: think about how the index will be read, not what columns are available to throw at it.&lt;/p&gt;
&lt;h3 id="mysql-clustered-indexes-flip-the-rule"&gt;MySQL clustered indexes flip the rule
&lt;/h3&gt;&lt;p&gt;The above applies cleanly to secondary indexes. The MySQL InnoDB primary key is a different animal: a clustered index, meaning the PK&amp;rsquo;s leaf pages are the table. The ordering of PK columns decides physical row order on disk, and that often matters more than selectivity.&lt;/p&gt;
&lt;p&gt;The canonical example is &lt;code&gt;PRIMARY KEY (tenant_id, id)&lt;/code&gt; on a multi-tenant table. &lt;code&gt;tenant_id&lt;/code&gt; has maybe 10K distinct values (low selectivity); &lt;code&gt;id&lt;/code&gt; is near-unique. By &amp;ldquo;most selective first,&amp;rdquo; the answer would be &lt;code&gt;(id, tenant_id)&lt;/code&gt;, and it would be wrong:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Physical clustering.&lt;/strong&gt; All rows for one tenant sit contiguously in the B-tree. Tenant-scoped range scans read a narrow slice of pages sequentially, and the buffer pool caches a single tenant&amp;rsquo;s hot data together. &lt;code&gt;(id, tenant_id)&lt;/code&gt; scatters that same tenant&amp;rsquo;s rows across the whole table.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secondary index lookups cost less.&lt;/strong&gt; InnoDB secondary indexes store the PK, not a row pointer. A query that uses a secondary index and then needs a full row does a PK lookup per match. With &lt;code&gt;(tenant_id, id)&lt;/code&gt;, those lookups for one tenant cluster together. With &lt;code&gt;(id, tenant_id)&lt;/code&gt;, each is random I/O across the table.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Insert locality.&lt;/strong&gt; If &lt;code&gt;id&lt;/code&gt; is monotonically increasing within a tenant, inserts land on recent pages per tenant, avoiding page splits scattered across the index.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The rule for an InnoDB PK is: put the column that represents the dominant access pattern first, even if it&amp;rsquo;s less selective. Selectivity cuts rows; clustering cuts I/O. On a large clustered index, I/O usually dominates.&lt;/p&gt;
&lt;p&gt;This is also why &lt;code&gt;PRIMARY KEY (id)&lt;/code&gt; plus &lt;code&gt;INDEX (tenant_id)&lt;/code&gt; on a multi-tenant table is often slower than &lt;code&gt;PRIMARY KEY (tenant_id, id)&lt;/code&gt;; the secondary index forces a PK-lookup hop on every read that the clustered choice avoids entirely.&lt;/p&gt;
&lt;p&gt;PostgreSQL&amp;rsquo;s primary key is a separate B-tree unique index, not clustered (a &lt;code&gt;CLUSTER&lt;/code&gt; command exists but isn&amp;rsquo;t maintained as rows are inserted), so the ordering logic there stays closer to the secondary-index rules.&lt;/p&gt;
&lt;h2 id="the-planner-doesnt-read-the-data---it-reads-statistics"&gt;The planner doesn&amp;rsquo;t read the data - it reads statistics
&lt;/h2&gt;&lt;p&gt;The planner&amp;rsquo;s entire decision-making process rests on statistics that summarize the data, not the data itself. PostgreSQL&amp;rsquo;s per-column statistics live in &lt;code&gt;pg_stats&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n_distinct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- estimated distinct values (negative means fraction)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;null_frac&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- fraction of rows that are NULL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;most_common_vals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- top values by frequency
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;most_common_freqs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- corresponding frequencies
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;histogram_bounds&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- distribution of non-common values
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pg_stats&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;orders&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attname&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;customer_id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;MySQL exposes similar information through &lt;code&gt;information_schema.STATISTICS&lt;/code&gt; and &lt;code&gt;INNODB_TABLESTATS&lt;/code&gt;, though less granularly than PostgreSQL&amp;rsquo;s statistics. MySQL lacks per-column histograms on most versions (8.0+ has optional histograms, off by default).&lt;/p&gt;
&lt;p&gt;These statistics are gathered by explicit &lt;code&gt;ANALYZE&lt;/code&gt; in PostgreSQL and maintained automatically by InnoDB in MySQL. They go stale between runs. A table that was analyzed at 10M rows and is now 200M rows has planner statistics that no longer reflect reality. Join reorderings based on those estimates are decisions made on outdated data.&lt;/p&gt;
&lt;p&gt;The usual symptom is a query that was fast yesterday and slow today, with no schema or query change. The planner&amp;rsquo;s row estimate for some step has drifted far enough from reality that the plan shape flipped: nested loop where it should have been hash join, or a sequential scan where an index seek would have won. &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; with its estimated-vs-actual row counts is the fastest way to confirm this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Index Scan using orders_customer_id_idx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- (cost=0.43..1234.56 rows=1000 width=128)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- (actual time=0.123..45.678 rows=180000 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;rows=1000&lt;/code&gt; is the estimate. The &lt;code&gt;actual rows=180000&lt;/code&gt; is reality. A ratio of 100x+ between them is the signal. The fix is statistical (refresh stats, increase the statistics target for that column, add extended statistics for correlated columns) and not a query rewrite.&lt;/p&gt;
&lt;h2 id="cardinality-estimation-errors-and-their-shape"&gt;Cardinality estimation errors and their shape
&lt;/h2&gt;&lt;p&gt;The single most common cause of bad query plans in production is a bad row-count estimate on an intermediate step. Two flavors, each with distinctive symptoms:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Underestimates.&lt;/strong&gt; The planner thinks a step will return 10 rows, actually returns 10 million. The plan picks a nested loop (good for a small outer side), which now runs 10 million iterations. A query that should have been a 50ms hash join takes 50 minutes. The telltale sign in &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; is &lt;code&gt;loops=10000000&lt;/code&gt; on an inner node that was costed for a handful.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overestimates.&lt;/strong&gt; The planner thinks 10 million rows, actually 10. The plan allocates a hash table sized for millions, spills to disk under memory pressure, and runs a 5ms lookup in 5 seconds. Less common but more insidious, because the query didn&amp;rsquo;t &amp;ldquo;fail&amp;rdquo; in any obvious way; it just used more memory and I/O than it needed.&lt;/p&gt;
&lt;p&gt;Both are failures of the statistics, not the query. Both are especially hard to diagnose because the query text is identical in the fast and slow cases; only the planner&amp;rsquo;s belief about the data changed. When the ratio between estimated and actual is large and consistent, the problem is upstream of the query.&lt;/p&gt;
&lt;h2 id="correlated-columns-break-the-independence-assumption"&gt;Correlated columns break the independence assumption
&lt;/h2&gt;&lt;p&gt;The planner estimates the selectivity of a compound predicate &lt;code&gt;WHERE a = x AND b = y&lt;/code&gt; by multiplying the individual selectivities, assuming the columns are statistically independent. When they&amp;rsquo;re not, the estimate can be off by orders of magnitude.&lt;/p&gt;
&lt;p&gt;The canonical example is &lt;code&gt;(country, state)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;US&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;CA&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Estimate: (0.25) * (0.02) * N = 0.5% of rows
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;-- Reality: ~2% of rows - state = &amp;#39;CA&amp;#39; implies country = &amp;#39;US&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The planner assumed the two filters cut the rowcount independently. In reality, &lt;code&gt;state = 'CA'&lt;/code&gt; already determines &lt;code&gt;country = 'US'&lt;/code&gt; (there are no California rows with a different country) so the compound filter isn&amp;rsquo;t as selective as the multiplication suggests.&lt;/p&gt;
&lt;p&gt;PostgreSQL 10+ supports extended statistics to fix this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;STATISTICS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;country_state_corr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ndistinct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The &lt;code&gt;dependencies&lt;/code&gt; statistic captures functional dependencies (one column determines another); &lt;code&gt;ndistinct&lt;/code&gt; captures the distinct combinations of the column set. Both are used during planning to correct the independence-assumption multiplication.&lt;/p&gt;
&lt;p&gt;MySQL has no equivalent. Correlated-column estimation errors there are harder to fix at the planner level; the workaround is usually to restructure the query (force a specific join order, introduce an intermediate CTE, or add a covering index that captures the correlated access pattern directly).&lt;/p&gt;
&lt;h2 id="unique-as-a-planner-signal-not-just-a-guardrail"&gt;UNIQUE as a planner signal, not just a guardrail
&lt;/h2&gt;&lt;p&gt;A &lt;code&gt;UNIQUE&lt;/code&gt; constraint is also a proof the planner can use. Knowing a column is unique lets the optimizer reason about the shape of joins and aggregates in ways it can&amp;rsquo;t when uniqueness is only implicit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Deduplication elimination.&lt;/strong&gt; &lt;code&gt;SELECT DISTINCT u.id FROM users u JOIN orders o ON o.user_id = u.id&lt;/code&gt; can skip the &lt;code&gt;DISTINCT&lt;/code&gt; step entirely if the planner knows &lt;code&gt;users.id&lt;/code&gt; is unique. The join already produces at most one row per &lt;code&gt;u.id&lt;/code&gt; per matching order, and the &lt;code&gt;DISTINCT&lt;/code&gt; becomes a no-op. Without the declared uniqueness, the planner has to run the dedup pass.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Join elimination.&lt;/strong&gt; When joining A to B on a unique column of B, and selecting only columns from A, the planner can drop the join entirely in some cases (it proved the join doesn&amp;rsquo;t change the output). This is a real optimization on star-schema queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reorderable joins.&lt;/strong&gt; Unique constraints make certain join orderings provably equivalent, giving the optimizer more plan shapes to choose from. The more plans it can try, the more likely it finds a good one.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Index-only scan eligibility.&lt;/strong&gt; Unique indexes are natural targets for index-only scans, which skip the heap/table access when every column the query needs is already in the index.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Schemas that leave uniqueness implicit (enforced in application code, promised in a wiki) can still produce correct results, but the planner can&amp;rsquo;t trust assumptions it can&amp;rsquo;t see. The constraint is what turns uniqueness from a property of the data into a property of the schema that the planner reads as a fact.&lt;/p&gt;
&lt;h2 id="what-unique-tells-a-schema-reading-model"&gt;What UNIQUE tells a schema-reading model
&lt;/h2&gt;&lt;p&gt;The planner isn&amp;rsquo;t the only consumer of declared uniqueness. Schema-reading assistants (Copilot, MCP-backed agents, text-to-SQL tools) read &lt;code&gt;information_schema.TABLE_CONSTRAINTS&lt;/code&gt; and &lt;code&gt;pg_constraint&lt;/code&gt; the same way they read column types. A declared &lt;code&gt;UNIQUE&lt;/code&gt; is the only signal in the catalog that says &amp;ldquo;at most one row per X.&amp;rdquo; Without it, the model has no way to prove 1:1 semantics and either hedges with a defensive &lt;code&gt;LIMIT 1&lt;/code&gt; it can&amp;rsquo;t justify or writes &lt;code&gt;GROUP BY&lt;/code&gt; / &lt;code&gt;DISTINCT&lt;/code&gt; passes that shouldn&amp;rsquo;t be necessary. &lt;code&gt;ON CONFLICT&lt;/code&gt; and &lt;code&gt;ON DUPLICATE KEY UPDATE&lt;/code&gt; targeting is especially fragile: the model picks the column name that matches the prompt (&amp;ldquo;upsert by email&amp;rdquo;) and the query either fails at runtime because no unique constraint exists on that column, or silently targets a different constraint than intended.&lt;/p&gt;
&lt;p&gt;Selectivity is the part the model has even less access to. Planner statistics (&lt;code&gt;pg_stats.n_distinct&lt;/code&gt;, MySQL&amp;rsquo;s &lt;code&gt;information_schema.STATISTICS&lt;/code&gt; cardinality estimates) aren&amp;rsquo;t part of the prompt for most schema-aware tools, and the model has no way to query them mid-generation. Asked &amp;ldquo;how do I speed this query up?&amp;rdquo; the assistant&amp;rsquo;s default answer is &amp;ldquo;add an index,&amp;rdquo; regardless of whether the indexed column has two distinct values or two million. The same schema discipline that keeps the planner honest (declared unique constraints on every at-most-one relationship, composite primary keys on bridge tables, column-level comments that describe the value shape) is what gives catalog-reading models enough context to produce queries that don&amp;rsquo;t require a second human pass.&lt;/p&gt;
&lt;h2 id="diagnosing-the-usual-suspects"&gt;Diagnosing the usual suspects
&lt;/h2&gt;&lt;p&gt;Three patterns cover most of the uniqueness/selectivity-shaped bugs in production:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;This query got slow and nothing changed.&amp;rdquo;&lt;/strong&gt; Run &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;. Compare estimated to actual row counts on each node. A large ratio (10x+) means the planner has stale statistics, missing extended statistics on correlated columns, or both. Refresh stats with &lt;code&gt;ANALYZE&lt;/code&gt;; add extended statistics if a compound predicate is the source.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;I built an index and the planner ignores it.&amp;rdquo;&lt;/strong&gt; Check the column&amp;rsquo;s selectivity directly: distinct values over total. Below ~5%, a sequential scan is usually the right choice and the planner isn&amp;rsquo;t wrong. If selectivity is high, check for functions in the &lt;code&gt;WHERE&lt;/code&gt; clause (non-SARGable predicates), implicit type casts (an indexed &lt;code&gt;BIGINT&lt;/code&gt; column filtered with a &lt;code&gt;VARCHAR&lt;/code&gt; literal can fall off the index), or stale statistics underreporting the column&amp;rsquo;s uniqueness.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;My UPSERT corrupts data under load.&amp;rdquo;&lt;/strong&gt; Check which unique key it&amp;rsquo;s targeting. In MySQL, &lt;code&gt;ON DUPLICATE KEY UPDATE&lt;/code&gt; fires on conflict with any unique key, including ones added after the query was written. In PostgreSQL, partial unique indexes require the predicate in &lt;code&gt;ON CONFLICT&lt;/code&gt;; mismatches silently fall through to insert rather than update.&lt;/p&gt;
&lt;h2 id="the-mental-model"&gt;The mental model
&lt;/h2&gt;&lt;p&gt;Uniqueness and selectivity collapse to two questions that both the planner and the engineer need answered for every table and query:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;How many rows per key?&lt;/strong&gt; Uniqueness. Determines whether joins multiply, whether UPSERTs target the right constraint, and whether aggregations can be trusted.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How many distinct values relative to total?&lt;/strong&gt; Selectivity. Determines whether indexes help, which join order the planner picks, and how badly a compound filter will miss.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both answers are visible to the planner if the constraints are declared and the statistics are current. Both become guesswork when they&amp;rsquo;re not. The habit that pays off isn&amp;rsquo;t heroic query tuning. It&amp;rsquo;s keeping the database&amp;rsquo;s model of the data honest: declare the unique constraints that exist (including composite ones on bridge tables), refresh statistics on busy tables, add extended statistics where correlation has burned you before, and read &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; for the ratio between estimated and actual rows every time a query slows down.&lt;/p&gt;</description></item></channel></rss>