Featured image of post Foreign Keys Are Not Optional

Foreign Keys Are Not Optional

An `INSERT INTO order_items (order_id) VALUES (9999)` against an order_id that doesn't exist. With a foreign key the database rejects the row. Without one, the row lands and the corruption surfaces months later when finance can't reconcile a report or backups have already rotated past the window.

TL;DR
Foreign keys are the last line of defense against orphaned data, silent corruption, and integrity issues that compound over time. Application-level validation covers the happy path; production finds every other path. The overhead is negligible; the cost of skipping them isn’t.

A new engineer joins the team and runs SELECT count(*) FROM order_items oi LEFT JOIN orders o ON o.id = oi.order_id WHERE o.id IS NULL as part of a routine schema audit. The number comes back 4,127. Four thousand line items pointing at orders that no longer exist. She asks the tech lead when this started, and the answer is “we dropped the foreign key two years ago during a launch crunch to shave write latency on the bulk import path.” The PR that did it has eleven approvals and one comment: “good catch.” The orphan cleanup will take a week, and the bigger question — what else has accumulated — will take longer.

Most of the orphans trace back to two services racing to write the parent row. Service A creates the order, service B creates the line items, and B sometimes ran first when A’s deploy lagged. With the FK in place, B’s inserts would have failed and the orchestration layer would have retried. Without it, B’s inserts landed. The retries never fired. The orphans piled up at roughly six per day for two years.

The happy path isn’t the only path

Application-level validation works great when everything is running normally. When every deploy goes clean, when no one is running a backfill at 2am, when your ORM is doing exactly what you think it’s doing. Production is the set of conditions where one of those isn’t true.

Every developer writing a query, a migration, or a backfill script has to carry the full mental model of the schema in their head: which tables depend on which, what breaks if a row disappears, where the implicit relationships are. Without foreign keys, that mental model is the only thing keeping the data consistent. The example writes itself.

1
2
3
-- Looks harmless. Is it?
INSERT INTO order_items (order_id, product_id, qty) VALUES (9999, 7, 2);
-- order 9999 doesn't exist. No FK, no error. Silent corruption.
Think of it this way
A foreign key is to data integrity what a type system is to application code: it catches mistakes at write time instead of letting them surface as bugs in production.

With a foreign key, the database rejects this immediately, the same way a compiler catches a type error before runtime. Without one, you find out weeks or months later when a report doesn’t add up or a customer calls about missing data. By then, your backups may have already rotated out. The data is just gone.

Service contracts drift the same way. Service A creates the parent record; service B creates the child. B ships a bug and starts referencing IDs that don’t exist. Without a foreign key there’s no error, just bad data accumulating quietly until someone notices the numbers don’t add up. ORMs have their own edge cases: race conditions in bulk inserts, upserts that skip association checks, lazy loading that masks broken references. Every major ORM has documented ways to let bad data slip through. The database is the one place where the check is guaranteed.

This is one specific case of a broader pattern: where business logic lives, database vs. application. Referential integrity is the textbook example of a correctness invariant that every write path has to pass through, and the database is the only layer that sees them all.

The performance question

If your architecture is so perfectly optimized that a foreign key check is the last thing left to tune, that’s not an FK problem. There are almost certainly unindexed queries, missing covering indexes, suboptimal join patterns, N+1 queries, poor partitioning strategy, collation mismatches forcing implicit conversions, functions wrapped around predicates killing index usage, datatype mismatches between join columns, oversized datatypes wasting pages and cache, stale statistics misleading the planner, parameter sniffing locking in bad plans, redundant data bloating tables, under-normalized or over-normalized schemas, tables with too many columns per row, tables with too few columns forcing constant joins, bad query designs pulling more data than needed, misconfigured OS settings, undersized buffer pools, wrong parallelization thresholds, and the list goes on. All hiding somewhere in the stack. If removing an FK constraint is the performance win on the table, it’s worth looking harder at everything else first.

Foreign keys add a check on every insert and update to the child table; the database verifies that the referenced row exists. In practice, this is a lookup against a primary key index. It’s fast. Microseconds. The overhead is negligible compared to the cost of tracking down integrity issues after the fact. Teams routinely spend weeks debugging problems that a foreign key would have caught at insert time.

In practice, the FK check is almost never the bottleneck. This is:

1
2
3
4
5
EXPLAIN ANALYZE
SELECT * FROM events WHERE created_at > '2024-01-01';
-- Seq Scan on events  (cost=0.00..4125892.80 rows=198234567 ...)
-- Planning Time: 0.2 ms
-- Execution Time: 287643.109 ms

That missing index on a 200 million row table is the bottleneck. Not the FK check.

It’s much harder to add them later

Adding a foreign key to a table with a few thousand rows is trivial. Adding one to a table with hundreds of millions of rows in a production database that’s been running for years is a different story entirely.

The database has to validate every existing row against the constraint. On MySQL, ALTER TABLE with a foreign key takes a lock; on a large table, that can mean minutes or hours of blocked writes. On PostgreSQL, you can split it into two steps:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Step 1: add the constraint without validating existing rows
ALTER TABLE order_items
  ADD CONSTRAINT fk_order
  FOREIGN KEY (order_id) REFERENCES orders(id)
  NOT VALID;

-- Step 2: validate (full table scan, can't avoid it)
ALTER TABLE order_items VALIDATE CONSTRAINT fk_order;
-- ERROR: insert or update on table "order_items" violates
--        foreign key constraint "fk_order"
-- Detail: Key (order_id)=(9912) is not present in table "orders".
The longer you wait, the harder it gets
Adding a foreign key to a table with years of accumulated data means finding and resolving every orphaned row first. On a large, busy database, that cleanup alone can take weeks.

There it is: orphaned data that’s been silently accumulating. The constraint can’t be added until every violation is found and resolved. On a large, busy database with years of drift, that cleanup alone can take weeks of careful work. Starting with the constraints on day one avoids this entirely.

What foreign keys actually give you

Beyond preventing bad data, foreign keys serve as living documentation. They tell every engineer who looks at the schema:

  • This table depends on that table
  • These rows cannot exist without those rows
  • This is the shape of the data, enforced by the system itself

Documentation gets outdated. Code comments drift. Constraints are always current because the database enforces them on every write.

Foreign keys also help the query planner. PostgreSQL uses FK relationships to make better decisions about joins. You’re protecting your data and helping the database perform better in the same migration.

FKs as the schema’s relationship map

The documentation value compounds once the readers aren’t all human. information_schema.KEY_COLUMN_USAGE in MySQL, pg_constraint in PostgreSQL: foreign keys are queryable catalog metadata, and every schema-reading assistant (Copilot, MCP-backed agents, text-to-SQL tools, RAG systems indexing the catalog) uses that metadata to reason about how tables connect. A declared FK is a machine-readable statement that order_items.order_id references orders.id. The model doesn’t have to guess from the column name.

Drop the constraint and the signal disappears. The assistant falls back to guessing joins by name match, which works for obvious cases (user_idusers.id) and fails on the real-world column vocabulary every mature schema accumulates: creator_id, modified_by, owner, assigned_to, ref_id, parent. Each of those is a logical FK with no metadata backing it, and a schema-reading model will confidently invent a relationship that doesn’t hold. Adding the FK fixes the integrity hole and, in the same migration, makes the schema self-describing to every tool that consumes catalog metadata, including the ones that didn’t exist when the table was first created.

The NoSQL comparison

It’s worth noting that document databases like MongoDB don’t have this problem in the same way. When an order, its line items, and its shipping address all live inside a single document, there’s nothing to orphan - integrity is structural. The data can’t reference something that doesn’t exist because it’s all embedded together.

That’s actually one of the real strengths of the document model. The moment a relational database splits that same data across orders, line_items, and addresses tables, those relationships need to be enforced somewhere. The application can try, but the database is the only place that guarantees it across every write path: manual queries, migrations, ORM edge cases, and all.

Foreign keys exist because relational databases chose normalization over duplication. That’s a good trade-off, but only if the relationships are actually enforced.

Compare the two models:

1
2
3
4
5
6
7
8
9
// Document model - integrity is structural
{
  "order_id": 1001,
  "user": { "id": 42, "name": "Alice" },
  "items": [
    { "product": "Widget", "qty": 2 },
    { "product": "Gadget", "qty": 1 }
  ]
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Relational model - integrity must be enforced
CREATE TABLE orders (
    id   INT PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id)
);

CREATE TABLE order_items (
    id       INT PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(id),
    product_id INT NOT NULL REFERENCES products(id),
    qty      INT NOT NULL
);

In the document, there’s nothing to orphan. In the relational model, remove those REFERENCES clauses and every row is on its own.

When it’s reasonable to skip them

In an OLTP system, almost never. If you’re using a relational database, you want relational integrity. That’s the whole point. If you don’t need enforced relationships between your data, a relational database might not be the right tool in the first place.

Where foreign keys can be impractical is in specific edge cases:

  • Partitioned tables where cross-partition foreign keys have historically been unsupported. PostgreSQL 12 added support for foreign keys referencing partitioned tables, though with some limitations: the referenced table must be partitioned, and certain partition schemes can still cause issues.
  • Staging tables used for temporary ETL ingestion before data is validated and moved to its final destination.

Even in analytics and data warehouses, integrity still matters; orphaned or dangling references mean wrong aggregations, broken joins, and reports that silently lie. The enforcement mechanism might look different, but the need for referential integrity doesn’t go away just because the workload changed.

Before you drop one

Before dropping a foreign key for performance, exhaust the thousand other ways to tune your system first. The check itself is a primary-key lookup measured in microseconds; on a profile, it’s almost always rounding error compared to the indexes you haven’t built, the queries that aren’t covering, or the planner stats the FK itself encodes for free. The constraint is almost never the bottleneck, and removing it has a habit of creating new ones a year or two down the line, usually discovered by an engineer who wasn’t on the team when the original PR landed.

SELECT insights FROM experience WHERE downtime = 0; -- Ruslan Tolkachev