Polymorphic References Are Not Foreign Keys

TL;DR

A polymorphic reference is resource_id plus resource_type where the type string chooses which table the ID points to. ORMs make it a one-liner; the database enforces nothing. Reads need conditional joins, orphans accumulate silently, and for most uses (comments, notifications, attachments) per-target tables or mutually-exclusive FKs are the better trade.

What the pattern looks like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE notifications (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    resource_id BIGINT NOT NULL,
    resource_type VARCHAR(50) NOT NULL,
    message TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- resource_type = 'order'   → resource_id references orders.id
-- resource_type = 'invoice' → resource_id references invoices.id
-- resource_type = 'ticket'  → resource_id references support_tickets.id

The tell is resource_id BIGINT NOT NULL with no REFERENCES clause; it can’t have one, because there are multiple targets. What the application treats as a foreign key is, at the database level, a plain integer with a sibling tag string.

What the database can’t do

The cost shows up as absence: every mechanism the database offers for reasoning about relationships is disabled, because the column’s meaning depends on data in another column.

No foreign key. A REFERENCES clause names exactly one target. Orphaned resource_id values are a write-time non-event and a read-time mystery. (Foreign Keys Are Not Optional covers the general cost; polymorphic is the case where skipping isn’t a choice.)
No cascade. Delete an order and nothing cleans up the notifications pointing at it. The application has to know every table that might hold a polymorphic reference to orders and clean each one. New tables added later don’t get noticed.
No planner metadata. Foreign keys feed join ordering and row estimates, especially in PostgreSQL. The planner sees resource_id as a BIGINT with a histogram and no known target.
No schema-level description. Anything that reads the catalog (ERD tools, query generators, AI assistants, typed-client generators) sees no link between notifications.resource_id and the tables it points at. The mapping lives in model files and string literals. (Comment Your Schema helps here but can’t fully restore the information.)

Orphans accumulate silently

A polymorphic column with no FK and no cascade develops orphans over time. Reads paper over them with LEFT JOIN ... WHERE target.id IS NOT NULL, so the broken rows disappear from the UI but stay in the table. In schemas a few years old, the orphan rate is rarely zero, and nobody designed for it.

Reads pay for the write-side convenience

The absent FK is the schema problem. The read-path shape is where the cost becomes daily. A query that needs any column from the referenced row can’t write a single join; the target depends on a per-row value, and SQL’s join syntax takes a static target.

1
2
3
4
5
6
7
8
9
-- Conditional LEFT JOIN per target
SELECT n.id, n.message,
       COALESCE(o.order_number, i.invoice_number, t.ticket_code) AS ref
FROM notifications n
LEFT JOIN orders   o ON n.resource_type = 'order'   AND n.resource_id = o.id
LEFT JOIN invoices i ON n.resource_type = 'invoice' AND n.resource_id = i.id
LEFT JOIN support_tickets t
                     ON n.resource_type = 'ticket'  AND n.resource_id = t.id
WHERE n.user_id = 42;

Every new target type adds a join clause here and in every other read-path query that displays a related field. The alternative (a UNION ALL per target) is narrower per branch but scales linearly with target count and pushes pagination up to the union level. Most ORMs’ default resolution is one query per (resource_type, resource_id) group, which is the N+1 pattern that makes polymorphic feeds slow once the target set widens.

“One column can point at many tables” on the write side turns into “every read query enumerates every possible table” on the read side. The symmetry people expect isn’t there.

Why the pattern spreads

It’s the path of least resistance that framework ergonomics encourage. Rails’ polymorphic: true, Django’s GenericForeignKey, and Laravel’s morphTo make one-liner what would otherwise be multiple belongs_to associations and a migration. “Comments on orders” and “comments on invoices” look like duplication, so a single comments table with commentable_id / commentable_type feels cleaner. An open-ended “add comments to anything” product ask reads as an argument against committing to a target list.

Each of those framings overweights the write-side cost (another table or another FK column) and underweights the integrity loss (no enforcement, no cascades, schema no longer describes itself). ORMs Are a Coupling covers the broader trade. Polymorphic is the canonical case where the ORM’s preferred shape is actively incompatible with what the database wants to enforce.

What the schema-reading assistant sees

A tool reading the catalog (Copilot on a schema dump, an MCP-backed agent, a RAG pipeline indexing DDL) sees notifications.resource_id BIGINT NOT NULL with no REFERENCES clause and no way to tell the column is anything other than an integer. Asked for “notifications about orders,” the assistant’s best guess is notifications.resource_id = orders.id: a join that runs clean, returns every notification whose resource_id happens to collide with an order ID (which includes invoice notifications, ticket notifications, and anything else pointing at an integer that also appears in orders), and surfaces plausible-looking but semantically nonsense rows. The resource_type filter that would make the join correct is the piece the schema doesn’t advertise.

This is the structural version of the problem covered in the bare id primary key: schema that can’t describe its own relationships forces every reader to guess, and schema-reading models guess confidently. Pulling the polymorphic column apart (per-target tables, mutually-exclusive FKs, supertype) restores the signal in the catalog. The assistant stops hallucinating the join; any RAG system indexing the schema picks up real REFERENCES metadata; the next engineer reading the table doesn’t need to grep the ORM models to find out which target types exist. The integrity win and the catalog-legibility win come in the same migration.

Alternatives

Each alternative gives back some of the database’s relational machinery at different levels of verbosity.

Per-target tables. Split along the target dimension: order_notifications, invoice_notifications, ticket_notifications, each with a real FK. Real cascades, real planner metadata, self-describing schema. Cost: duplicated column sets and an explicit UNION ALL for cross-target reads. That union already exists implicitly in the polymorphic shape, just moved from the read query into typed branches.

Mutually-exclusive nullable FKs with CHECK. One table, one FK column per target, a constraint enforcing exactly one is non-null:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE TABLE notifications (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    order_id BIGINT REFERENCES orders(id),
    invoice_id BIGINT REFERENCES invoices(id),
    ticket_id BIGINT REFERENCES support_tickets(id),
    message TEXT NOT NULL,
    CONSTRAINT exactly_one_target CHECK (
        (order_id IS NOT NULL)::int +
        (invoice_id IS NOT NULL)::int +
        (ticket_id IS NOT NULL)::int = 1
    )
);

Real FKs per target, real cascades, row’s meaning unambiguous. Scales reasonably up to a handful of targets and stops scaling cleanly somewhere around ten.

Supertype table. A shared parent table carries a common ID; each target type’s table references the parent. The polymorphic column then points at the parent, which is a single real FK. Cleanest structural answer and the one with the highest adoption cost; retrofitting this onto an existing schema is substantial migration work.

When polymorphic is actually the right call

The trade-offs stack up unfavorably for most common uses, but not all. The pattern earns its keep when the relationship is genuinely best-effort: audit events, activity logs, “recently viewed” lists, undo history, where a lost reference is a recoverable annoyance rather than a correctness incident. The FK was never going to be load-bearing, and the polymorphic shape matches the actual semantics: “reference anything, and if it’s gone, show a tombstone.”

Outside that zone the default bias should run the other way. A comment system with three possible parents is not a case for polymorphism; it’s a case for three comment tables or mutually-exclusive FK columns, with the ORM abstracting the read-side stitching.

The bigger picture

Polymorphic references are a specific case of a broader pattern: designs that move information out of the schema and into the application, in exchange for ergonomics in the model layer. The schema drifts from “self-describing relational structure” toward “indexed key-value store the application interprets.” That’s a legitimate position (DynamoDB and friends live there on purpose) but a relational database running on polymorphic associations is paying for a relational engine and choosing not to use most of what it offers.

The pattern isn’t wrong. It’s an aggressive trade, priced on day one by the convenience of polymorphic: true and on day three hundred by the silent orphan count, the conditional joins, and resource_id BIGINT telling no one what the table is related to. Reach for it on purpose. Keep the option of pulling it back onto typed FK columns open, because the migrations away are slower the longer the schema has been pretending the reference isn’t there.