Featured image of post Comment Your Schema

Comment Your Schema

Every major database engine supports comments on tables and columns. Almost nobody uses them. Here's why that's a problem — and how little effort it takes to fix.

TL;DR
Every major database engine lets you attach comments to tables and columns — descriptions that live in the schema itself and show up in every tool that reads it. Unlike external documentation, comments can’t drift out of sync with the schema they describe. They cost nothing to add, require no downtime, and make every schema dump, ER diagram, and monitoring tool more useful. Almost nobody uses them.

There’s a specific kind of frustration that comes from staring at a column called status TINYINT NOT NULL in a table you’ve never seen before. Is 1 active? Pending? Enabled? Is 0 deleted or just inactive? The column type doesn’t tell you. The column name doesn’t tell you. The only way to find out is to grep through application code, ask someone, or query the data and try to reverse-engineer the semantics from whatever comes back.

This isn’t a tooling gap. It’s a habit gap. The mechanism to solve it has existed in every major database engine for decades.

What schema comments are

Schema comments are metadata strings attached directly to database objects — tables, columns, indexes, views. They’re stored in the database catalog and exposed through standard metadata queries.

In PostgreSQL:

1
2
3
COMMENT ON TABLE orders IS 'Customer purchase orders. One row per checkout.';
COMMENT ON COLUMN orders.status IS '1=pending, 2=processing, 3=shipped, 4=delivered, 5=cancelled';
COMMENT ON COLUMN orders.end_date IS 'NULL means order is still in progress';

In MySQL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ALTER TABLE orders MODIFY COLUMN status TINYINT NOT NULL
  COMMENT '1=pending, 2=processing, 3=shipped, 4=delivered, 5=cancelled';

-- Or at table creation:
CREATE TABLE orders (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL COMMENT 'References users.id',
    status TINYINT NOT NULL COMMENT '1=pending, 2=processing, 3=shipped, 4=delivered, 5=cancelled',
    total_cents BIGINT NOT NULL COMMENT 'Order total in cents, not dollars',
    end_date DATE DEFAULT NULL COMMENT 'NULL = order still in progress'
) COMMENT='Customer purchase orders. One row per checkout.';

SQL Server uses extended properties, Oracle uses COMMENT ON. The syntax varies. The concept is universal.

Where comments show up

This is the part that makes comments more useful than a wiki page or a README. Because they live in the catalog, they propagate automatically to every tool that reads schema metadata.

Schema dumps. pg_dump and mysqldump include comments in the output. Anyone restoring a backup or reviewing a migration gets the context without looking elsewhere.

ER diagram tools. DBeaver, DataGrip, pgAdmin, MySQL Workbench — all render column comments in schema viewers and diagrams. Hover over a column and the description is right there.

information_schema and catalog queries. Any script, tool, or automation that queries metadata picks up comments for free.

In PostgreSQL:

1
2
3
4
5
6
7
8
-- Get a table comment
SELECT obj_description('orders'::regclass, 'pg_class');

-- Get a column comment
SELECT col_description('orders'::regclass, 1);

-- Or just use psql
\d+ orders

In MySQL:

1
2
3
4
SELECT COLUMN_NAME, COLUMN_COMMENT
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'your_database'
  AND TABLE_NAME = 'orders';

ORM introspection. Many ORMs and code generators that reverse-engineer models from databases will pull comments into generated code as docstrings or annotations.

Monitoring and alerting tools. When an alert fires about a table or column, the comment provides immediate context without requiring someone to look up external documentation.

CI pipelines and schema validation. Linting tools can check whether new tables and columns have comments, the same way code linters check for function docstrings.

The point is that comments flow through the entire toolchain. You write them once, in one place, and every tool that reads the schema benefits — without configuration, without plugins, without maintaining anything separately.

What goes wrong without them

The absence of comments creates a specific category of problems that compounds over time.

Onboarding takes longer than it should. Every new engineer who encounters status TINYINT has to ask someone or investigate. Multiply that by every ambiguous column in every table across every service. It’s not a one-time cost — it’s paid every time someone new touches the schema.

Debugging becomes archaeology. When something breaks at 2am and you’re looking at a table with columns named type, flag, ref_id, and config — all with no comments — you’re not debugging. You’re reverse-engineering institutional knowledge that should have been written down.

Schema reviews lose context. A migration that adds is_processed TINYINT(1) DEFAULT 0 looks fine syntactically. But processed by what? When? Is it idempotent? A comment turns a migration review from “does this look right?” into “does this match what we agreed on?”

Documentation drifts. External documentation — wiki pages, READMEs, Confluence — goes stale the moment someone adds a column or changes a status code without updating the doc. Comments live in the schema itself. They move with ALTER TABLE. They show up in every dump. They can’t be in a different repo.

What to comment

Not everything needs a comment. A column called created_at TIMESTAMP NOT NULL is self-documenting. Focus on the columns where the schema doesn’t tell the whole story:

  • Status and type columns — what do the values mean? 1=active, 2=suspended, 3=closed
  • Nullable columns where NULL has meaning — does NULL mean “not set,” “not applicable,” or “ongoing”?
  • ID columns that reference other tables without foreign keysowner_id BIGINT COMMENT 'References users.id'
  • Columns with non-obvious unitstotal_cents vs total (dollars? cents? units?), duration (seconds? milliseconds? minutes?)
  • Columns with business logic encoded in valuesplan_type TINYINT COMMENT '1=free, 2=starter, 3=pro, 4=enterprise'
  • Tables themselves — what does this table represent? One row per what?

A good table comment answers: “What is one row in this table?” A good column comment answers: “What does this value mean when I see it in a query result?”

Adding comments to an existing schema

This is the part that makes the cost-benefit ratio hard to argue against.

In PostgreSQL, COMMENT ON is a catalog-only operation. It takes no locks on the table. It doesn’t rewrite data. It doesn’t block reads or writes. On a table with 500 million rows, it completes in milliseconds.

1
2
-- This is instant. No lock. No downtime. No risk.
COMMENT ON COLUMN orders.status IS '1=pending, 2=processing, 3=shipped, 4=delivered, 5=cancelled';

In MySQL, ALTER TABLE ... MODIFY COLUMN with a comment is a metadata-only change in most cases with InnoDB online DDL — but behavior depends on the version and what else is in the MODIFY. For comment-only changes on MySQL 8.0+, the ALGORITHM=INSTANT path applies. On older versions or when combined with type changes, it may trigger a table rebuild.

1
2
3
-- MySQL 8.0+ — instant for comment-only changes
ALTER TABLE orders MODIFY COLUMN status TINYINT NOT NULL
  COMMENT '1=pending, 2=processing, 3=shipped, 4=delivered, 5=cancelled';
Zero-downtime operation
In PostgreSQL, COMMENT ON is a catalog-only update — no table lock, no rewrite, completes in milliseconds even on tables with hundreds of millions of rows. In MySQL 8.0+, comment-only changes go through the ALGORITHM=INSTANT path. This is about as safe as database changes get.

The risk profile is as close to zero as database changes get. There’s no reason not to do this incrementally — comment a few columns every time you touch a table. Over time, the schema becomes self-documenting.

Generating documentation from comments

Because comments live in the catalog, tools can extract them and produce browsable documentation automatically. A few worth knowing:

SchemaSpy — Java-based, generates interactive HTML with ER diagrams. Reads table and column comments from the catalog. Run it against your database and you get a full documentation site with relationships, comments, and diagrams — no manual authoring.

tbls — A single Go binary, CI-friendly. Outputs Markdown, PlantUML, or SVG. Reads comments directly from the schema. Designed to run in pipelines — generate docs on every migration, commit them to the repo, and they stay in sync.

DataGrip / DBeaver — Not doc generators per se, but both render column comments inline in their schema browsers. For teams already using these, comments become immediately visible without any extra tooling.

The pattern is the same across all of them: comments in the schema become descriptions in the output. No separate documentation source to maintain. The schema is the source.

For teams that want generated docs as part of CI, tbls is the lowest-friction option — add it to your pipeline, point it at the database, and commit the Markdown output. Every migration that adds or changes a comment updates the docs automatically.

Making it stick

The challenge isn’t the mechanism — it’s the habit. Comments rot just like any other documentation if they’re not maintained. A few things help:

Comment at creation time. If the comment is part of the CREATE TABLE or migration, it happens naturally. Retrofitting is always harder than including it from the start.

Add it to your migration template. If your team uses a migration tool, add comment fields to the template. Make the absence visible rather than the default.

Lint for it. A simple CI check can flag tables or columns without comments. It doesn’t have to block merges — even a warning changes behavior over time.

Treat comments as part of the schema, not as documentation. When a column’s semantics change, the comment changes in the same migration. Same PR, same review.

The trade-offs

Comments aren’t a substitute for all documentation. They’re good at describing what a column or table is, not how a multi-table workflow operates. System-level documentation — data flow diagrams, service interaction maps, runbooks — still belongs somewhere else.

Watch out
Stale comments are worse than no comments — they actively mislead. A column that says '1=active, 2=inactive' when the code now also uses 3=suspended will send someone down the wrong path. Treat comment updates as part of the migration, not a follow-up task.

There’s also a maintenance cost. Stale comments are worse than no comments because they actively mislead. A column that says '1=active, 2=inactive' when the code was updated to also use 3=suspended will send someone down the wrong path. The mitigation is treating comments as schema, not as a nice-to-have — they change when the schema changes.

And for teams with hundreds of tables and thousands of columns, retrofitting comments is a slow process. It’s not a weekend project. It’s an incremental habit that pays off over months.

The bigger picture

Schema metadata is the cheapest form of documentation a team can produce. It lives where the data lives. It propagates through every tool in the stack without configuration. It costs nothing to add and nothing to maintain if you treat it as part of the schema rather than something separate.

The question isn’t whether your team will need this information — it’s whether they’ll find it in the schema or spend time tracking it down somewhere else.

Start with the columns that make people ask questions. Comment those first. The rest follows.

SELECT insights FROM experience WHERE downtime = 0; -- Ruslan Tolkachev