Featured image of post Foreign Keys Are Not Optional

Foreign Keys Are Not Optional

Why enforcing relationships in the application layer sounds good in theory — and what happens in production when you do.

TL;DR
Foreign keys aren’t just constraints — they’re the last line of defense against orphaned data, silent corruption, and integrity issues that compound over time. Application-level validation covers the happy path, but production has a way of finding every other path. The overhead is negligible; the cost of skipping them isn’t.

It’s a reasonable thought. And in a perfect world, it would work. But production isn’t a perfect world.

The happy path isn’t the only path

Application-level validation works great — when everything is running normally. When every deploy goes clean, when no one is running a backfill script at 2am, when your ORM is doing exactly what you think it’s doing.

But over the years, things happen:

Without foreign keys, you have to know every relationship exists and enforce it yourself. Every developer writing a query, a migration, a backfill script needs to carry the full mental model of the schema in their head. Which tables depend on which? What breaks if this row disappears?

1
2
3
-- Looks harmless. Is it?
INSERT INTO order_items (order_id, product_id, qty) VALUES (9999, 7, 2);
-- order 9999 doesn't exist. No FK, no error. Silent corruption.

With a foreign key, the database rejects this immediately — the same way a compiler catches a type error before runtime. Without one, you find out weeks or months later when a report doesn’t add up or a customer calls about missing data. And by then, your backups may have already rotated out. The data is just gone. The choice is between catching mistakes at write time or discovering unrecoverable corruption in production.

A service changes its contract. Service A creates the parent record. Service B creates the child. Service B ships a bug and starts referencing IDs that don’t exist. Without a foreign key, there’s no error — just bad data accumulating quietly until someone notices the numbers don’t add up.

ORMs have edge cases. Race conditions in bulk inserts, upserts that skip association checks, lazy loading that masks broken references. Every major ORM has documented ways to let bad data slip through. The database is the one place where the check is guaranteed.

The performance question

If your architecture is so perfectly optimized that a foreign key check is the last thing left to tune — that’s not an FK problem. There are almost certainly unindexed queries, missing covering indexes, suboptimal join patterns, N+1 queries, poor partitioning strategy, collation mismatches forcing implicit conversions, functions wrapped around predicates killing index usage, datatype mismatches between join columns, oversized datatypes wasting pages and cache, stale statistics misleading the planner, parameter sniffing locking in bad plans, redundant data bloating tables, under-normalized or over-normalized schemas, tables with too many columns per row, tables with too few columns forcing constant joins, bad query designs pulling more data than needed, misconfigured OS settings, undersized buffer pools, wrong parallelization thresholds — and the list goes on. All hiding somewhere in the stack. If removing an FK constraint is the performance win on the table, it’s worth looking harder at everything else first.

Foreign keys add a check on every insert and update to the child table — the database verifies that the referenced row exists. In practice, this is a lookup against a primary key index. It’s fast. Microseconds. The overhead is negligible compared to the cost of tracking down integrity issues after the fact. Teams routinely spend weeks debugging problems that a foreign key would have caught at insert time.

In practice, the FK check is almost never the bottleneck. This is:

1
2
3
4
5
EXPLAIN ANALYZE
SELECT * FROM events WHERE created_at > '2024-01-01';
-- Seq Scan on events  (cost=0.00..4125892.80 rows=198234567 ...)
-- Planning Time: 0.2 ms
-- Execution Time: 287643.109 ms

That missing index on a 200 million row table — that’s the bottleneck. Not the FK check.

It’s much harder to add them later

Adding a foreign key to a table with a few thousand rows is trivial. Adding one to a table with hundreds of millions of rows in a production database that’s been running for years is a different story entirely.

The database has to validate every existing row against the constraint. On MySQL, ALTER TABLE with a foreign key takes a lock — on a large table, that can mean minutes or hours of blocked writes. On PostgreSQL, you can split it into two steps:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Step 1: add the constraint without validating existing rows
ALTER TABLE order_items
  ADD CONSTRAINT fk_order
  FOREIGN KEY (order_id) REFERENCES orders(id)
  NOT VALID;

-- Step 2: validate (full table scan, can't avoid it)
ALTER TABLE order_items VALIDATE CONSTRAINT fk_order;
-- ERROR: insert or update on table "order_items" violates
--        foreign key constraint "fk_order"
-- Detail: Key (order_id)=(9912) is not present in table "orders".

And there it is — orphaned data that’s been silently accumulating. The constraint can’t be added until every violation is found and resolved. On a large, busy database with years of drift, that cleanup alone can take weeks of careful work. Starting with the constraints on day one avoids this entirely.

What foreign keys actually give you

Beyond preventing bad data, foreign keys serve as living documentation. They tell every engineer who looks at the schema:

  • This table depends on that table
  • These rows cannot exist without those rows
  • This is the shape of the data, enforced by the system itself

Documentation gets outdated. Code comments drift. Constraints are always current because the database enforces them on every write.

Foreign keys also help the query planner. PostgreSQL uses FK relationships to make better decisions about joins. So you’re not just protecting your data — you’re helping the database perform better too.

The NoSQL comparison

It’s worth noting that document databases like MongoDB don’t have this problem in the same way. When an order, its line items, and its shipping address all live inside a single document, there’s nothing to orphan — integrity is structural. The data can’t reference something that doesn’t exist because it’s all embedded together.

That’s actually one of the real strengths of the document model. But the moment a relational database splits that same data across orders, line_items, and addresses tables, those relationships need to be enforced somewhere. The application can try, but the database is the only place that guarantees it across every write path — manual queries, migrations, ORM edge cases, and all.

Foreign keys exist because relational databases chose normalization over duplication. That’s a good trade-off — but only if the relationships are actually enforced.

Compare the two models:

1
2
3
4
5
6
7
8
9
// Document model — integrity is structural
{
  "order_id": 1001,
  "user": { "id": 42, "name": "Alice" },
  "items": [
    { "product": "Widget", "qty": 2 },
    { "product": "Gadget", "qty": 1 }
  ]
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Relational model — integrity must be enforced
CREATE TABLE orders (
    id   INT PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id)
);

CREATE TABLE order_items (
    id       INT PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(id),
    product_id INT NOT NULL REFERENCES products(id),
    qty      INT NOT NULL
);

In the document, there’s nothing to orphan. In the relational model, remove those REFERENCES clauses and every row is on its own.

When it’s reasonable to skip them

In an OLTP system — almost never. If you’re using a relational database, you want relational integrity. That’s the whole point. If you don’t need enforced relationships between your data, a relational database might not be the right tool in the first place.

Where foreign keys can be impractical is in specific edge cases:

  • Partitioned tables where cross-partition foreign keys have historically been unsupported. PostgreSQL 12 added support for foreign keys referencing partitioned tables, though with some limitations — the referenced table must be partitioned, and certain partition schemes can still cause issues
  • Staging tables used for temporary ETL ingestion before data is validated and moved to its final destination

Even in analytics and data warehouses, integrity still matters — orphaned or dangling references mean wrong aggregations, broken joins, and reports that silently lie. The enforcement mechanism might look different, but the need for referential integrity doesn’t go away just because the workload changed.

The bigger picture

Data outlives applications. It outlives frameworks, teams, and the engineers who made the original design decisions. The people move on, but the data — and whatever integrity issues it carries — stays.

Foreign keys aren’t overhead. They’re a small investment that pays for itself the first time something unexpected happens in production. And in any long-running system, something unexpected always does.

Before dropping a foreign key for performance, exhaust the thousand other ways to optimize your system first. The constraint is almost certainly not the problem — but removing it can easily become one.

Start with the constraints.

SELECT insights FROM experience WHERE downtime = 0; -- Ruslan Tolkachev