<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Legacy on EXPLAIN ANALYZE</title>
        <link>https://explainanalyze.com/tags/legacy/</link>
        <description>Recent content in Legacy on EXPLAIN ANALYZE</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en-us</language>
        <lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://explainanalyze.com/tags/legacy/index.xml" rel="self" type="application/rss+xml" /><item>
            <title>Legacy Schemas Are Sediment, Not Design</title>
            <link>https://explainanalyze.com/p/legacy-schemas-are-sediment-not-design/</link>
            <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
            <guid>https://explainanalyze.com/p/legacy-schemas-are-sediment-not-design/</guid>
            <description>&lt;img src=&#34;https://explainanalyze.com/&#34; alt=&#34;Featured image of post Legacy Schemas Are Sediment, Not Design&#34; /&gt;&lt;div class=&#34;tldr-box&#34;&gt;&#xA;    &lt;strong&gt;TL;DR&lt;/strong&gt;&#xA;    &lt;div&gt;A legacy schema looks like a design but reads like a sediment — layers of decisions from different eras, where names that once described the data no longer do and conventions that look uniform aren&amp;rsquo;t. The fix isn&amp;rsquo;t renaming (prohibitively expensive once every caller depends on the current names); it&amp;rsquo;s documenting the drift so the next reader — human or LLM — can navigate what&amp;rsquo;s actually there.&lt;/div&gt;&#xA;&lt;/div&gt;&#xA;&#xA;&lt;p&gt;A new engineer joins the team and reads the schema. &lt;code&gt;tmp_orders&lt;/code&gt; looks like scaffolding — something to delete once the real migration ships. The tech lead answers: never delete it. &lt;code&gt;tmp_orders&lt;/code&gt; is the main orders table. The temp-to-permanent rename was planned for 2017, nobody shipped it, and every service in the company now writes to the table. The name is a lie the schema tells every new reader — and every LLM generating SQL against the catalog.&lt;/p&gt;&#xA;&lt;p&gt;The obvious fix is to rename the table. Nothing about the database itself prevents it — drop the &lt;code&gt;tmp_&lt;/code&gt; prefix, update every call site, ship. The reality is that every service, ORM model, report, integration, and runbook references &lt;code&gt;tmp_orders&lt;/code&gt; by name. The rename is a multi-quarter effort that crosses team boundaries, and the only justification is legibility. Teams rarely prioritize legibility work, so the name stays, and the schema keeps lying.&lt;/p&gt;&#xA;&lt;h2 id=&#34;whats-drifted&#34;&gt;What&amp;rsquo;s drifted&#xA;&lt;/h2&gt;&lt;p&gt;Legacy drift shows up in three visible modes and one invisible one.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Names that stopped describing the data.&lt;/strong&gt; &lt;code&gt;tmp_&lt;/code&gt; tables that are permanent. &lt;code&gt;old_&lt;/code&gt; columns that are current. &lt;code&gt;deprecated_&lt;/code&gt; fields that every write path still populates. &lt;code&gt;flag1&lt;/code&gt;, &lt;code&gt;flag2&lt;/code&gt;, &lt;code&gt;status_code&lt;/code&gt; — names whose meaning was obvious when the column was added, because the person adding it remembered why. By the time a new reader arrives, the intent is gone and the name is false advertising. &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/comment-your-schema/&#34; &gt;Comment Your Schema&lt;/a&gt; covers the documentation side of this; legacy schemas are the case where comments would help most and where they&amp;rsquo;re most often absent.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Conventions per era.&lt;/strong&gt; The 2014-era backend team used &lt;code&gt;camelCase&lt;/code&gt;. The 2019 rewrite adopted &lt;code&gt;snake_case&lt;/code&gt;. The 2022 microservice added a third table with &lt;code&gt;PascalCase&lt;/code&gt; because the Go team wrote it and nobody pushed back. Now one database has &lt;code&gt;userId&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, and &lt;code&gt;UserID&lt;/code&gt; — all referring to the same entity across different tables. The LLM that generates &lt;code&gt;business.created_at&lt;/code&gt; when the column is actually &lt;code&gt;business.createdDate&lt;/code&gt; isn&amp;rsquo;t wrong in any sense the schema could catch; it&amp;rsquo;s inferring a convention from one table and applying it to another, which is a reasonable thing to do in a schema that has only one convention.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Tables that were supposed to be temporary.&lt;/strong&gt; &lt;code&gt;tmp_orders&lt;/code&gt; is the canonical example, but every long-lived database has some. Staging tables that got promoted to production. Migration tables that weren&amp;rsquo;t cleaned up. &amp;ldquo;Phase 2&amp;rdquo; tables built for a transitional period that shipped in phase 1 and never came back to finish. The names encode the original intent; the data encodes the current reality; the two diverge a little more with every migration that preserves the name instead of fixing it.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Invisible structural drift.&lt;/strong&gt; Charsets and collations are the version of drift that doesn&amp;rsquo;t even show up in the column list. Older tables created before the Unicode migration default to &lt;code&gt;latin1&lt;/code&gt;; newer tables use &lt;code&gt;utf8mb4&lt;/code&gt;. A join between a &lt;code&gt;VARCHAR(100)&lt;/code&gt; column in one table and a &lt;code&gt;VARCHAR(100)&lt;/code&gt; column in another — both with the same name, both with the same logical meaning — silently produces different results depending on which side&amp;rsquo;s collation MySQL picks. In the bad cases, an implicit charset conversion kills index usage and turns the query into a table scan. &lt;code&gt;SHOW TABLE STATUS&lt;/code&gt; reveals this; reading the column list doesn&amp;rsquo;t. Most LLMs read the column list.&lt;/p&gt;&#xA;&lt;h2 id=&#34;why-this-is-worse-for-llms-than-for-humans&#34;&gt;Why this is worse for LLMs than for humans&#xA;&lt;/h2&gt;&lt;p&gt;A new human engineer working with a legacy schema can ask. They can ping the on-call channel, look up the original migration in git, trace a column back to the PR that introduced it, or simply ask &amp;ldquo;what is &lt;code&gt;flag1&lt;/code&gt;?&amp;rdquo; and get an answer from someone who knows. The answer is often wrong or outdated, but it&amp;rsquo;s a starting point, and the engineer learns to treat the schema with appropriate suspicion.&lt;/p&gt;&#xA;&lt;p&gt;An LLM generating SQL from the catalog has no such recourse. It sees &lt;code&gt;tmp_orders&lt;/code&gt; and reasons from the name — probably &amp;ldquo;this is a staging table, prefer the non-tmp version if one exists, otherwise deprioritize.&amp;rdquo; It sees &lt;code&gt;old_price&lt;/code&gt; and treats it as historical. It sees &lt;code&gt;flag1 BOOLEAN&lt;/code&gt; and infers a generic flag. Each inference is reasonable; each is wrong in the specific case; and the schema gives no signal that this is one of the cases where reasoning from the name produces bad SQL.&lt;/p&gt;&#xA;&lt;p&gt;This is the sharper version of the &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/the-bare-id-primary-key-when-every-table-joins-to-every-other-table/&#34; &gt;generic &lt;code&gt;id&lt;/code&gt; primary key&lt;/a&gt; problem. Both are failures of the schema to describe itself. The PK case hides what&amp;rsquo;s being matched; legacy drift hides what anything &lt;em&gt;means&lt;/em&gt;. Neither failure shows up at write time — both produce queries that run, return data, and look plausible, because the rows exist and the types match. The wrongness is in the interpretation, which the database has no way to check.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-fix-is-documentation-not-renaming&#34;&gt;The fix is documentation, not renaming&#xA;&lt;/h2&gt;&lt;p&gt;The obvious fix — rename everything to match intent and convention — fails on cost. Every table, column, and constraint in a mature schema is referenced by services the team has forgotten about: scheduled jobs, Redshift imports, third-party integrations, BI dashboards built by a contractor in 2019, runbooks pasted into wiki pages that nobody has edited since. A rename that looks like a one-line migration touches every surface the table is exposed on, and the projects that survive the attempt usually take a year and leave the schema worse during the transition.&lt;/p&gt;&#xA;&lt;p&gt;The workable fix is to stop the drift from continuing and make the existing drift visible. Stopping new drift means picking a convention for new tables and columns and writing it down where CI can enforce it (&lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/schema-conventions-dont-survive-without-automation/&#34; &gt;Schema Conventions and Why They Matter&lt;/a&gt; covers the mechanics). Making existing drift visible means &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/comment-your-schema/&#34; &gt;column and table comments&lt;/a&gt; on everything whose name doesn&amp;rsquo;t match its meaning, plus a per-era mapping somewhere in the repo that says &amp;ldquo;this database has four naming conventions, used in these periods, applied to these tables.&amp;rdquo; Legacy schemas are the case where &lt;code&gt;COMMENT ON&lt;/code&gt; pays off highest — the names are already wrong, the cost of fixing them is prohibitive, and the comment is the one affordable signal the next reader gets.&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;&#xA;&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;COMMENT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;ON&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;TABLE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tmp_orders&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;Main orders table. The tmp_ prefix is historical — a 2017 migration was planned to rename this and was never completed. Do not drop.&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;COMMENT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;ON&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;COLUMN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;customers&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;flag1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;VIP customer flag. Legacy name from the 2014 schema — never renamed because of external reporting dependencies.&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;One-line migrations, zero risk, and every reader — human and LLM — now has a chance of reading the schema correctly. This isn&amp;rsquo;t a fix in the sense of &amp;ldquo;problem solved.&amp;rdquo; It&amp;rsquo;s a fix in the sense of &amp;ldquo;the next reader has a chance.&amp;rdquo; The drift is structural; the documentation is how you navigate it without making it worse.&lt;/p&gt;&#xA;&lt;h2 id=&#34;when-a-clean-rewrite-is-actually-worth-it&#34;&gt;When a clean rewrite is actually worth it&#xA;&lt;/h2&gt;&lt;p&gt;Renames and migrations aren&amp;rsquo;t always wrong. Three cases where the rewrite earns its cost:&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;A misleading name is actively causing incidents.&lt;/strong&gt; If &lt;code&gt;tmp_orders&lt;/code&gt; is regularly truncated or dropped by someone who reads the name literally and acts on it, the rename cost is less than the recovery cost from the next incident. Usually the practical fix here isn&amp;rsquo;t a rename — it&amp;rsquo;s a view, synonym, or ALTER-TABLE-RENAME that exposes &lt;code&gt;orders&lt;/code&gt; as the canonical name and leaves &lt;code&gt;tmp_orders&lt;/code&gt; as a compatibility alias for legacy callers.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;A schema migration is happening anyway.&lt;/strong&gt; If the team is replatforming the OLTP database or splitting it across services, the rewrite opens a window where renames are cheap because callers are being updated either way. Take the opportunity; don&amp;rsquo;t schedule a separate naming cleanup six months later when the window has closed.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;A database small enough that it fits one person&amp;rsquo;s head.&lt;/strong&gt; Early-stage startups, internal tools, bounded-scope services. At twenty tables and three developers, a Saturday afternoon of renames is cheaper than a decade of comments.&lt;/p&gt;&#xA;&lt;p&gt;In every other case, the schema is load-bearing history, and you renovate it the way you renovate a building with people still living in it: patch, document, and schedule the demolition for a window when it&amp;rsquo;s genuinely cheap.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-bigger-picture&#34;&gt;The bigger picture&#xA;&lt;/h2&gt;&lt;p&gt;Every production schema is a compressed record of the decisions the team made under pressure. Some of those decisions were good and still fit; some were good at the time and don&amp;rsquo;t fit now; some were expedient and nobody noticed. The schema can&amp;rsquo;t tell you which is which, and it was never going to. The fix isn&amp;rsquo;t to aspire to a clean schema that doesn&amp;rsquo;t accumulate history — no such schema exists past a three-year horizon — but to leave the next reader enough signal to decompress the sediment without guessing.&lt;/p&gt;&#xA;&lt;p&gt;Comment the columns that lie. Document the conventions per era. Treat LLMs generating SQL against the catalog as the same kind of reader a new engineer is, and give them the same written context. The goal isn&amp;rsquo;t a schema without legacy drift; it&amp;rsquo;s a schema whose drift is legible to the people and tools that will inherit it.&lt;/p&gt;&#xA;</description>
        </item></channel>
</rss>
