[{"content":" TL;DR Foreign keys aren\u0026rsquo;t just constraints — they\u0026rsquo;re the last line of defense against orphaned data, silent corruption, and integrity issues that compound over time. Application-level validation covers the happy path, but production has a way of finding every other path. The overhead is negligible; the cost of skipping them isn\u0026rsquo;t. It\u0026rsquo;s a reasonable thought. And in a perfect world, it would work. But production isn\u0026rsquo;t a perfect world.\nThe happy path isn\u0026rsquo;t the only path Application-level validation works great — when everything is running normally. When every deploy goes clean, when no one is running a backfill script at 2am, when your ORM is doing exactly what you think it\u0026rsquo;s doing.\nBut over the years, things happen:\nWithout foreign keys, you have to know every relationship exists and enforce it yourself. Every developer writing a query, a migration, a backfill script needs to carry the full mental model of the schema in their head. Which tables depend on which? What breaks if this row disappears?\n1 2 3 -- Looks harmless. Is it? INSERT INTO order_items (order_id, product_id, qty) VALUES (9999, 7, 2); -- order 9999 doesn\u0026#39;t exist. No FK, no error. Silent corruption. With a foreign key, the database rejects this immediately — the same way a compiler catches a type error before runtime. Without one, you find out weeks or months later when a report doesn\u0026rsquo;t add up or a customer calls about missing data. And by then, your backups may have already rotated out. The data is just gone. The choice is between catching mistakes at write time or discovering unrecoverable corruption in production.\nA service changes its contract. Service A creates the parent record. Service B creates the child. Service B ships a bug and starts referencing IDs that don\u0026rsquo;t exist. Without a foreign key, there\u0026rsquo;s no error — just bad data accumulating quietly until someone notices the numbers don\u0026rsquo;t add up.\nORMs have edge cases. Race conditions in bulk inserts, upserts that skip association checks, lazy loading that masks broken references. Every major ORM has documented ways to let bad data slip through. The database is the one place where the check is guaranteed.\nThe performance question If your architecture is so perfectly optimized that a foreign key check is the last thing left to tune — that\u0026rsquo;s not an FK problem. There are almost certainly unindexed queries, missing covering indexes, suboptimal join patterns, N+1 queries, poor partitioning strategy, collation mismatches forcing implicit conversions, functions wrapped around predicates killing index usage, datatype mismatches between join columns, oversized datatypes wasting pages and cache, stale statistics misleading the planner, parameter sniffing locking in bad plans, redundant data bloating tables, under-normalized or over-normalized schemas, tables with too many columns per row, tables with too few columns forcing constant joins, bad query designs pulling more data than needed, misconfigured OS settings, undersized buffer pools, wrong parallelization thresholds — and the list goes on. All hiding somewhere in the stack. If removing an FK constraint is the performance win on the table, it\u0026rsquo;s worth looking harder at everything else first.\nForeign keys add a check on every insert and update to the child table — the database verifies that the referenced row exists. In practice, this is a lookup against a primary key index. It\u0026rsquo;s fast. Microseconds. The overhead is negligible compared to the cost of tracking down integrity issues after the fact. Teams routinely spend weeks debugging problems that a foreign key would have caught at insert time.\nIn practice, the FK check is almost never the bottleneck. This is:\n1 2 3 4 5 EXPLAIN ANALYZE SELECT * FROM events WHERE created_at \u0026gt; \u0026#39;2024-01-01\u0026#39;; -- Seq Scan on events (cost=0.00..4125892.80 rows=198234567 ...) -- Planning Time: 0.2 ms -- Execution Time: 287643.109 ms That missing index on a 200 million row table — that\u0026rsquo;s the bottleneck. Not the FK check.\nIt\u0026rsquo;s much harder to add them later Adding a foreign key to a table with a few thousand rows is trivial. Adding one to a table with hundreds of millions of rows in a production database that\u0026rsquo;s been running for years is a different story entirely.\nThe database has to validate every existing row against the constraint. On MySQL, ALTER TABLE with a foreign key takes a lock — on a large table, that can mean minutes or hours of blocked writes. On PostgreSQL, you can split it into two steps:\n1 2 3 4 5 6 7 8 9 10 11 -- Step 1: add the constraint without validating existing rows ALTER TABLE order_items ADD CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id) NOT VALID; -- Step 2: validate (full table scan, can\u0026#39;t avoid it) ALTER TABLE order_items VALIDATE CONSTRAINT fk_order; -- ERROR: insert or update on table \u0026#34;order_items\u0026#34; violates -- foreign key constraint \u0026#34;fk_order\u0026#34; -- Detail: Key (order_id)=(9912) is not present in table \u0026#34;orders\u0026#34;. And there it is — orphaned data that\u0026rsquo;s been silently accumulating. The constraint can\u0026rsquo;t be added until every violation is found and resolved. On a large, busy database with years of drift, that cleanup alone can take weeks of careful work. Starting with the constraints on day one avoids this entirely.\nWhat foreign keys actually give you Beyond preventing bad data, foreign keys serve as living documentation. They tell every engineer who looks at the schema:\nThis table depends on that table These rows cannot exist without those rows This is the shape of the data, enforced by the system itself Documentation gets outdated. Code comments drift. Constraints are always current because the database enforces them on every write.\nForeign keys also help the query planner. PostgreSQL uses FK relationships to make better decisions about joins. So you\u0026rsquo;re not just protecting your data — you\u0026rsquo;re helping the database perform better too.\nThe NoSQL comparison It\u0026rsquo;s worth noting that document databases like MongoDB don\u0026rsquo;t have this problem in the same way. When an order, its line items, and its shipping address all live inside a single document, there\u0026rsquo;s nothing to orphan — integrity is structural. The data can\u0026rsquo;t reference something that doesn\u0026rsquo;t exist because it\u0026rsquo;s all embedded together.\nThat\u0026rsquo;s actually one of the real strengths of the document model. But the moment a relational database splits that same data across orders, line_items, and addresses tables, those relationships need to be enforced somewhere. The application can try, but the database is the only place that guarantees it across every write path — manual queries, migrations, ORM edge cases, and all.\nForeign keys exist because relational databases chose normalization over duplication. That\u0026rsquo;s a good trade-off — but only if the relationships are actually enforced.\nCompare the two models:\n1 2 3 4 5 6 7 8 9 // Document model — integrity is structural { \u0026#34;order_id\u0026#34;: 1001, \u0026#34;user\u0026#34;: { \u0026#34;id\u0026#34;: 42, \u0026#34;name\u0026#34;: \u0026#34;Alice\u0026#34; }, \u0026#34;items\u0026#34;: [ { \u0026#34;product\u0026#34;: \u0026#34;Widget\u0026#34;, \u0026#34;qty\u0026#34;: 2 }, { \u0026#34;product\u0026#34;: \u0026#34;Gadget\u0026#34;, \u0026#34;qty\u0026#34;: 1 } ] } 1 2 3 4 5 6 7 8 9 10 11 12 -- Relational model — integrity must be enforced CREATE TABLE orders ( id INT PRIMARY KEY, user_id INT NOT NULL REFERENCES users(id) ); CREATE TABLE order_items ( id INT PRIMARY KEY, order_id INT NOT NULL REFERENCES orders(id), product_id INT NOT NULL REFERENCES products(id), qty INT NOT NULL ); In the document, there\u0026rsquo;s nothing to orphan. In the relational model, remove those REFERENCES clauses and every row is on its own.\nWhen it\u0026rsquo;s reasonable to skip them In an OLTP system — almost never. If you\u0026rsquo;re using a relational database, you want relational integrity. That\u0026rsquo;s the whole point. If you don\u0026rsquo;t need enforced relationships between your data, a relational database might not be the right tool in the first place.\nWhere foreign keys can be impractical is in specific edge cases:\nPartitioned tables where cross-partition foreign keys have historically been unsupported. PostgreSQL 12 added support for foreign keys referencing partitioned tables, though with some limitations — the referenced table must be partitioned, and certain partition schemes can still cause issues Staging tables used for temporary ETL ingestion before data is validated and moved to its final destination Even in analytics and data warehouses, integrity still matters — orphaned or dangling references mean wrong aggregations, broken joins, and reports that silently lie. The enforcement mechanism might look different, but the need for referential integrity doesn\u0026rsquo;t go away just because the workload changed.\nThe bigger picture Data outlives applications. It outlives frameworks, teams, and the engineers who made the original design decisions. The people move on, but the data — and whatever integrity issues it carries — stays.\nForeign keys aren\u0026rsquo;t overhead. They\u0026rsquo;re a small investment that pays for itself the first time something unexpected happens in production. And in any long-running system, something unexpected always does.\nBefore dropping a foreign key for performance, exhaust the thousand other ways to optimize your system first. The constraint is almost certainly not the problem — but removing it can easily become one.\nStart with the constraints.\n","date":"2024-02-13T00:00:00Z","image":"/p/foreign-keys-are-not-optional/cover.png","permalink":"/p/foreign-keys-are-not-optional/","title":"Foreign Keys Are Not Optional"}]