The Bare `id` Primary Key: When Every Table Joins to Every Other Table

TL;DR

A bare id primary key on every table makes a.id = b.id valid SQL between any two tables, which means neither a human reviewing the query nor an LLM generating one can tell which of those equalities are meaningful. Name primary keys after the table they identify, and the schema describes its own relationships.

Here’s a query an AI assistant generated against a real production schema:

1
2
3
4
SELECT u.email, a.payload
FROM users u
JOIN actions a ON u.id = a.id
WHERE u.email = '[email protected]';

Syntactically clean. Ran without error. Returned zero rows, which the assistant reported back as “this user has no actions.” The real answer: users.id is a BIGINT and actions.id is a CHAR(36) UUID. MySQL coerced the integer to a string, compared it to a UUID, and found no match. The join wasn’t wrong, exactly. It was meaningless, and the database had no way to say so.

The experienced reader’s first fix is “just use UUIDs everywhere” or “enforce the type at join time.” Neither works. The footgun isn’t the type mismatch; it’s the column name. When every table’s primary key is named id, a.id = b.id is a valid expression between any two tables in the schema, and nothing in the column names tells you whether that expression means anything. Fix the types and you close one failure mode; the identically-typed, semantically-unrelated users.id = 42 = orders.id case still ships.

What nobody can see

The <table>_id convention is older than most of us, and the case for it is usually framed as clarity or style. The sharper framing is that bare id hides the information that matters most at the point of the join (which table’s identity is being compared, and whether comparing them makes sense) from every reader of the query.

The query’s reviewer. ON u.id = a.id gives no hint of what’s being matched. A human reviewer has to carry the table-to-alias mapping (u is users, a is actions) and the table-to-type mapping (users.id is BIGINT, actions.id is UUID) in working memory, then cross-check them against the join condition. None of those steps are hard, but reviewers skip them because the column names look symmetric. Two .id references read as “joining on primary keys,” which is the kind of join nobody flags.

The LLM reading the schema. An assistant generating SQL from the catalog sees users(id BIGINT, ...) and actions(id CHAR(36), ...) as two tables with primary keys named id. Absent a full column-type check on every candidate join (and most schema-reading prompts don’t do this), the natural-looking join between “a user and their actions” is u.id = a.id, which is exactly wrong. The schema presented the column as joinable; the LLM took it at face value. The same mistake a tired human makes, but at scale and without fatigue to blame.

The static analyzer. Linters and schema-aware query builders operate on names first and types second. A rule that warns on suspicious cross-table joins has no signal to fire on when both sides are .id; the column names match, so the join is “legitimate” by shape. The same rule on users.user_id = actions.action_id would flag it immediately, because the names would be obviously non-corresponding.

None of these readers are missing a step they should have taken. They’re all doing the reasonable thing, and the reasonable thing produces wrong queries because the schema is telling them id is id in both tables.

Three failure modes, ranked by how loudly they fail

Three distinct outcomes hide behind a.id = b.id, and they don’t fail equally:

PostgreSQL, mixed types. The comparison errors out with operator does not exist: bigint = uuid. Loud, caught in development, fixed before merge. The best failure mode.
MySQL, mixed types. Silent coercion to string, zero rows returned. The opening example. Bad, because “no results” looks like valid data to every downstream consumer.
Either engine, same type but semantically unrelated. BIGINT users.id = 42 matched against BIGINT orders.id = 42 returns the rows where the integers happen to collide. The query runs, the result set isn’t empty, and the rows look plausible because they’re real rows from real tables. The worst failure mode, because nothing about the output signals that the join was nonsense.

The first two are loud enough to catch in review. The third is the one that ships. The third is the default once more than one table in the schema uses a plain BIGINT id, which is almost every relational schema in existence.

Zero rows looks like no data

A join that silently returns zero rows because of a type coercion is indistinguishable from a join that legitimately has no matches. Code generators, dashboards, and AI assistants all interpret empty results as “the relationship exists but has no rows,” not “the query is nonsense.” The failure hides inside success.

Mixed PK types make the naming problem sharper

Production schemas rarely stay on one PK strategy for long. The original tables are usually BIGINT AUTO_INCREMENT because the framework defaulted to it; a newer service switches to UUIDs to let clients generate IDs offline or to distribute across shards; join tables pick up composite keys because (user_id, role_id) is the natural identity. Nothing in the schema announces which tables fall into which bucket; SHOW CREATE TABLE or \d is the only source of truth, and even that requires reading every table to know what joins are legal.

Mixed types are where the naming footgun turns from theoretical to frequent. When every PK was a BIGINT, the “same type but semantically unrelated” case was the main risk and reviewers caught most of it. Once the schema has BIGINT and UUID sitting next to each other (all named id) the mismatched-type cases pile on top, and “no data found” becomes a regular report from any tool generating queries from the schema.

The sizing question (when to pick BIGINT versus UUID versus UUIDv7 versus composite, and what each costs at the index level) is covered separately in Random UUIDs as Primary Keys. The two problems interact but have independent fixes: pick your PK types deliberately, and name them so the schema describes its own relationships. Neither fix substitutes for the other.

Naming is the lever that actually helps

Naming is what makes a schema describe its own relationships without requiring the reader (human or otherwise) to open every CREATE TABLE. Two conventions, consistently applied, close most of the gap:

Name the primary key after the table. users.user_id, orders.order_id, actions.action_id. The equality users.user_id = orders.order_id reads as obvious nonsense, because the column names are no longer identical. Reviewers see it, LLMs don’t produce it, linters can flag it. The cost is a small amount of redundancy in queries (users.user_id instead of users.id), which is almost always a fair trade. This lines up with the broader guidance in Schema Conventions and Why They Matter.

Foreign keys mirror the target PK. orders.user_id clearly references users.user_id. actions.user_id clearly references users.user_id. This is already common practice; the only change is that the target’s PK name matches, closing the loop. Foreign Keys Are Not Optional covers why the FK itself matters; naming is what makes the FK legible without the REFERENCES clause in hand.

The bare id convention is defensible when the PK column only ever shows up in queries alongside its table name (users.id) and never as a bare id in a SELECT list or join condition. That discipline is hard to enforce across a team over years, and every framework’s default query builder produces SELECT id FROM users without thinking about it. The naming fix makes the discipline unnecessary.

When bare `id` is actually fine

Not every schema needs to bend. A small application, a service with a handful of tables, or a database where every query is reviewed by one team has plenty of context to keep the a.id = b.id landmine out of reach. The cost of the convention scales with the number of tables, the number of engineers, and the number of non-human query generators; in the small case it rarely shows up.

What changes once any of those numbers grow: nobody remembers which tables are BIGINT versus UUID, the assistant pattern of generating queries from schema is routine, and the review process that caught a.id = b.id in a 20-table schema can’t read every join in a 400-table one. At that size the convention pays rent, and renaming PKs is a migration that gets slower every quarter.

The bigger picture

A schema’s job is to hold data correctly and describe its own shape well enough that the tools reading it can reason about relationships without reading every line. The bare id PK is a small departure from that (one column name shared across tables) but it’s the departure that most consistently produces silent-wrong-answer queries, because SQL has no way to distinguish “same name, same meaning” from “same name, different meaning.”

Name the primary key after the table it identifies, so the schema tells its own story when someone (human or otherwise) joins two of them together. It costs almost nothing on day one and leaves the schema legible at 400 tables.