The schema started clean four years ago: users(id, email, password_hash, created_at) — four columns. Today the table is renamed customers and has 184 columns. Billing address. Shipping address. Three additional shipping addresses numbered 2 through 4. preferences_json for user settings. Twelve feature-flag TINYINTs. Three Stripe identifiers from three processor migrations. last_login_at, last_seen_at, last_purchase_at, last_notification_sent_at. Forty more columns whose meaning lives in Confluence, if anywhere. No single ALTER TABLE ADD COLUMN was unreasonable at the time. The accumulated result is an average row size of 6KB, an UPDATE to last_login_at that rewrites every byte of it, and a buffer pool holding four customer rows per page instead of forty.
The obvious fix is to normalize it — split into customer_profile, customer_billing, customer_addresses, customer_preferences, customer_feature_flags, customer_audit. That’s the textbook answer and it’s the one that breaks the moment you look at the dominant read. The list view on the admin page needs name, email, status, last login, Stripe status, and total spent — now it’s a six-way join on every page load. The fix that looked clean in the migration doc makes the most-frequent query more expensive, not less. The read cost moves to the place it’s paid most often, and somebody — usually a few months later — proposes a materialized view to “just flatten it back out,” which is the god table returning through a different door.
How a row-store actually reads a row
Before the cost math makes sense: OLTP engines like InnoDB and PostgreSQL’s heap store complete rows laid out contiguously on fixed-size pages — typically 16KB in InnoDB, 8KB in PostgreSQL. A page holds as many rows as fit. When a query needs one column of one row, the engine doesn’t read that column alone; it locates the row’s page via an index lookup or scan, loads the whole page into the buffer pool, and reads the requested column out of the in-memory row image.
The one exception is the index-only scan: if every column the query projects and filters on is already present inside an index, the base table doesn’t have to be touched and only the index pages are loaded. See Covering Index Traps for how quickly this optimization disappears — usually the moment a SELECT list grows by one column. Every other read path goes through the row, which means the row’s width sets the floor on how much data the engine moves per lookup. Reading email from a 184-column customer row loads 6KB into memory to return 50 bytes; reading the same column from an 800-byte row loads 800 bytes. The buffer pool is a fixed size and every byte of unused column data in it is displacing something another query needs.
Column stores (ClickHouse, BigQuery, Parquet-backed warehouses) invert this entirely — data is laid out by column, so reading one column reads only that column’s storage. The wide-table cost math doesn’t apply there, which is why this anti-pattern is specifically a row-store OLTP problem and why denormalized fact tables in analytical warehouses are fine at 300 columns.
What 150 columns actually costs
The individual cost of one column is negligible. The system-level cost shows up in several places at once, and none of them are visible in a diff that adds one more.
Row size and write amplification. InnoDB stores full rows on disk pages, and an UPDATE rewrites the entire row even if only one column changed. On a 184-column table averaging 6KB per row, updating last_login_at on every sign-in rewrites 6KB, not 8 bytes. PostgreSQL doesn’t rewrite in place — MVCC creates a new tuple for every UPDATE and marks the old one dead — but the new tuple is 6KB too, and VACUUM has that much more to reclaim. Either engine, the write cost per logical change scales with row width.
Buffer pool density. The page-per-read mechanism above means buffer-pool efficiency scales inversely with row width. At 6KB per row, an InnoDB 16KB page holds two rows; at 400 bytes per row it holds forty. A database with 10GB of buffer pool has the effective working set of a much smaller instance once rows get wide — queries that used to run hot start touching disk for no reason other than that the rows they cared about no longer fit in memory alongside the rows other queries cared about.
Secondary indexes inherit the width problem. Every secondary index in InnoDB carries a copy of the primary key at its leaves; every index entry is a key-columns + PK-copy record. A wide table tends to accumulate indexes — you index email, Stripe ID, last-login, phone, region, account-manager-ID, each for a different query path. Six secondary indexes on a 184-column table isn’t unusual, and each of them is physically larger than it would be on a narrow table, because the PK copy and fill-factor choices interact with row density. Covering indexes are also harder to arrange: the list view wants eight columns projected, and indexing eight columns of a 184-column table to cover one query is an expensive trade.
Lock and transaction width. Every UPDATE acquires a row-level lock. Transactions that touch a wide row hold that lock for the duration of the transaction, and because the row spans many concerns — billing, preferences, audit timestamps — transactions from unrelated code paths contend on the same row. A background job updating last_seen_at now serializes against a billing job updating stripe_customer_id on the same customer, because both paths lock the same row. In the split-by-concern shape, they’d contend on different rows of different tables.
Schema migrations get more expensive. ALTER TABLE ADD COLUMN on a 184-column table is slower, holds metadata locks longer, and has a larger blast radius if it fails. MySQL’s online DDL is usually fine for NULL-default additions; PostgreSQL is generally fast for the same case. But any migration that needs to rewrite rows (changing a column type, adding NOT NULL with a backfill) scales with row size, and a 6KB row rewrite on 200 million rows is a different operation than an 800-byte row rewrite on the same count.
Why LLMs make this worse
Schema drift in the wide-table direction is what language models reinforce by default. A model generating ALTER TABLE for a feature request reads the current schema and proposes the smallest change that makes the feature work — which is almost always adding columns to the table that already holds the related data. Proposing a split requires understanding the access pattern, the transaction boundaries, and the write frequency of the new columns versus the existing ones. None of that is in the CREATE TABLE.
The loop reinforces itself: the wider the table gets, the more natural it is for the next change to widen it further. “Where do loyalty tier and tier expiry go?” The model sees customers has every other user-attached concept in it and adds two columns. The alternative — CREATE TABLE customer_loyalty (customer_id PK FK, tier, expires_at) — requires the model to argue for a split, and splits are rare in the training data compared to additions because splits are rare in real codebases for the same reason: they’re harder to ship than additions. The model is correctly pattern-matching on what humans actually do, which is exactly the problem.
ORMs compound this. One model equals one table is the default shape in ActiveRecord, Django ORM, Prisma, SQLAlchemy, and Ecto. Refactoring a Customer model into three co-owned tables is a change that touches every query, every serializer, every test. The ORM makes “add a column to the existing model” a five-line change and “split the model” a project. Engineers pick the cheap option every time, and the wide table ratchets.
Split by access pattern, not by concept
“Normalize it” isn’t the fix because normalization is a property of data shape, not query cost. The fix is to look at what columns are actually read and written together, and keep those co-located; the rest moves out.
A workable decomposition for the customers example:
- Core hot table — the columns read on nearly every query:
id,email,name,status,tier,stripe_customer_id,created_at. Maybe twenty columns. This is what the list view, the auth path, and most API responses need. - 1:1 cold tables — concerns that are read rarely or in specific flows:
customer_auditfor login/seen/purchase timestamps,customer_preferencesfor user settings,customer_feature_flagsfor the twelve TINYINT flags. Each is a separate table withcustomer_idas PK and FK, joined only when the flow actually needs it. Writes tolast_login_atstop rewriting the billing row. - 1:N tables for repeating groups — addresses, payment methods, anything that was modeled as
shipping_address_2,shipping_address_3,shipping_address_4is anaddressestable with a FK and a type. This collapses polymorphic-ish schema decisions that shouldn’t have been made at the column level in the first place; see Polymorphic References for the related pattern where doing this without a FK goes wrong.
The trade-off is that some queries now join two or three tables instead of reading one. On the hot path this is fine — the joins are on PK-equals-FK, the join tables are small, and the read is usually cheaper than scanning a fat row. The cold path is where it matters: the audit screen now joins customers to customer_audit, which costs one indexed lookup and nobody notices. The place to be careful is the query that reads from three of the split tables on every request — if that’s dominant, one of those tables probably belongs merged back in.
When a wide table is actually fine
Not every 100-column table is a god table. Three cases where width is defensible:
Analytical and reporting tables on columnar storage. As noted above, warehouses like ClickHouse, BigQuery, and Redshift invert the cost calculus — reading one column doesn’t load the rest, and the normalization pressure flips: denormalize aggressively because joins are expensive and per-column reads are cheap. This anti-pattern is specifically a row-store OLTP problem.
Small tables that stay small. A tenants table with 80 columns and 500 rows fits entirely in the buffer pool. The write amplification is paid a few thousand times a day, not a few million. The secondary-index cost is negligible because the indexes are small. Width matters when row count is large enough for the per-row cost to dominate — on small tables it doesn’t.
Every query reads every column. Uncommon but real. If the dominant read is “fetch the full customer record for display” and the split would produce a join that runs on every request anyway, the split doesn’t help. The test is whether the queries you actually run touch disjoint column sets — if they do, the split has a real win; if they don’t, it’s architecture for its own sake.
The bigger picture
Relational databases aren’t built for developer convenience. They’re built for storage efficiency and retrieval speed — narrow rows, well-placed indexes, joins on indexed keys, query plans that read only what they need. Normalization isn’t an academic ideal; it’s the shape that lines up with how the engine actually pays its bills. Every cost mechanism in this post — buffer-pool density, write amplification, index bloat, row-lock width — is the engine reporting the same thing in different dialects: the shape you’re asking it to hold isn’t the shape it was optimized for. The SELECT-*-and-done dream is the developer’s cost model, not the database’s.
God tables aren’t designed; they’re the limit of a sequence of rational local decisions where the global cost is invisible at each step. The column count of a mature production table is usually a decent proxy for how long the team has been making the cheap choice, which is most teams most of the time — and that is not by itself a failure. The failure is that the cost goes uncounted. A 6KB row is a write-amplification multiplier on every UPDATE, a buffer-pool multiplier on every read, and an index-size multiplier on every secondary index. None of those costs are on the PR that adds a column; all of them are on the dashboard that shows p99 drifting up quarter after quarter.
The lever is to count the cost at the system level when the table hits a certain width — pick a threshold, sixty columns, a hundred, whatever fits — and make the next column addition a conversation about whether this concern belongs here, not a line in a migration. The answer is often still yes, but it shouldn’t be the default answer. When it’s no, the split is far cheaper at column sixty than at column one-eighty; the table doesn’t care, but every caller of the table does, and the rewrite’s blast radius scales with how long the drift went uncorrected.