<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Json on EXPLAIN ANALYZE</title>
        <link>https://explainanalyze.com/tags/json/</link>
        <description>Recent content in Json on EXPLAIN ANALYZE</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en-us</language>
        <lastBuildDate>Thu, 23 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://explainanalyze.com/tags/json/index.xml" rel="self" type="application/rss+xml" /><item>
            <title>TEXT and JSON Columns: Where the Schema Goes to Hide</title>
            <link>https://explainanalyze.com/p/text-and-json-columns-where-the-schema-goes-to-hide/</link>
            <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
            <guid>https://explainanalyze.com/p/text-and-json-columns-where-the-schema-goes-to-hide/</guid>
            <description>&lt;img src=&#34;https://explainanalyze.com/&#34; alt=&#34;Featured image of post TEXT and JSON Columns: Where the Schema Goes to Hide&#34; /&gt;&lt;div class=&#34;tldr-box&#34;&gt;&#xA;    &lt;strong&gt;TL;DR&lt;/strong&gt;&#xA;    &lt;div&gt;A &lt;code&gt;TEXT&lt;/code&gt; or &lt;code&gt;JSON&lt;/code&gt; column moves the schema out of the database catalog and into application code — the data inside has a shape, but the DDL won&amp;rsquo;t tell you what it is. Readers can&amp;rsquo;t query into it without knowing the format, planners can&amp;rsquo;t reason about it, and the shape drifts across years of writes with no signal to the next reader. The fix isn&amp;rsquo;t &amp;ldquo;don&amp;rsquo;t use JSON&amp;rdquo;; it&amp;rsquo;s to promote the fields that actually get queried into real columns and treat the rest as genuinely opaque.&lt;/div&gt;&#xA;&lt;/div&gt;&#xA;&#xA;&lt;p&gt;An AI assistant is asked to &amp;ldquo;find customers who upgraded to enterprise in the last quarter.&amp;rdquo; It reads the catalog, finds &lt;code&gt;api_logs(id, endpoint VARCHAR, payload LONGTEXT, created_at DATETIME)&lt;/code&gt;, and generates the reasonable query:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;&#xA;&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;JSON_EXTRACT&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;payload&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;$.action&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;AS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;action&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;created_at&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;api_logs&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;WHERE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;JSON_EXTRACT&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;payload&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;$.action&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;upgrade&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;AND&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;JSON_EXTRACT&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;payload&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;$.plan&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;enterprise&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;AND&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;created_at&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NOW&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;INTERVAL&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;90&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;DAY&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Runs clean. Returns zero rows. The actual key was renamed from &lt;code&gt;action&lt;/code&gt; to &lt;code&gt;event.type&lt;/code&gt; two years ago when the team adopted a shared event schema — new rows match &lt;code&gt;$.event.type&lt;/code&gt;, old rows still match &lt;code&gt;$.action&lt;/code&gt;, and no one migrated the historical data because it wasn&amp;rsquo;t queryable anyway. Neither column nor catalog said any of this. The query is syntactically perfect, semantically correct for the key it guessed, and wrong because the key doesn&amp;rsquo;t exist in most of the rows.&lt;/p&gt;&#xA;&lt;p&gt;The obvious fix is &amp;ldquo;switch to JSONB, validate with a JSON schema, add a GIN index.&amp;rdquo; Each one helps at the margin and none of them close the gap. JSONB tells you the blob is valid JSON, not what keys are in it. CHECK constraints with &lt;code&gt;JSON_SCHEMA_VALID&lt;/code&gt; or &lt;code&gt;jsonb_matches_schema&lt;/code&gt; work prospectively, but the six years of rows already in the table were written against five format generations and no validator reaches back in time. A GIN index accelerates key lookups but only if you know which keys to look up. The problem isn&amp;rsquo;t the storage format — it&amp;rsquo;s that the schema emigrated to application code, and changing the column type doesn&amp;rsquo;t bring it back.&lt;/p&gt;&#xA;&lt;h2 id=&#34;what-leaves-the-catalog-when-the-column-becomes-a-blob&#34;&gt;What leaves the catalog when the column becomes a blob&#xA;&lt;/h2&gt;&lt;p&gt;DDL is the contract between the database and everything that reads it. A typed column says &amp;ldquo;this value is an integer between 0 and 2³¹−1, and here&amp;rsquo;s the index I&amp;rsquo;ve built over it.&amp;rdquo; A &lt;code&gt;TEXT&lt;/code&gt; or &lt;code&gt;JSON&lt;/code&gt; column says &amp;ldquo;this value is a string the application decided on, and the application can tell you what that means.&amp;rdquo; The second contract is thinner in ways that compound.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Readers can&amp;rsquo;t discover the shape from the schema.&lt;/strong&gt; &lt;code&gt;information_schema.COLUMNS&lt;/code&gt; for a JSON column returns &lt;code&gt;COLUMN_TYPE = &#39;json&#39;&lt;/code&gt; and nothing else. Every tool that reads catalog metadata — MCP servers, ERD generators, typed-client code generators, AI assistants, new engineers running &lt;code&gt;\d+&lt;/code&gt; — sees a blob. The shape lives in the serializer class, the protobuf definition, the TypeScript interface, or nowhere. Whichever of those the reader happens to find is the shape they&amp;rsquo;ll assume. See &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/comment-your-schema/&#34; &gt;Comment Your Schema&lt;/a&gt; for the lowest-effort way to leave a trail, but comments can describe the shape; they can&amp;rsquo;t make the catalog enforce it.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Generational drift is silent.&lt;/strong&gt; Year one the payload is &lt;code&gt;{action, user}&lt;/code&gt;. A migration adds nested metadata: &lt;code&gt;{action, user, metadata: {source}}&lt;/code&gt;. A rewrite flattens and renames: &lt;code&gt;{event: {type, user_id}, source}&lt;/code&gt;. A new service standardizes with a version field: &lt;code&gt;{version: 3, event: {...}}&lt;/code&gt;. All four versions are sitting in the same column with nothing to distinguish them at read time except the keys they happen to have. A JSON_EXTRACT path written against today&amp;rsquo;s producer hits the newest generation and silently misses the older ones. The failure mode is exactly the one described in &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/legacy-schemas-are-sediment-not-design/&#34; &gt;Legacy Schemas Are Sediment&lt;/a&gt;: the schema&amp;rsquo;s history is compressed into the data, and the data can&amp;rsquo;t decompress itself.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Writes are untyped.&lt;/strong&gt; Without CHECK constraints or a JSON-schema validator, the writer is the only guardrail. A service deployed last Tuesday that emits &lt;code&gt;amount&lt;/code&gt; as the string &lt;code&gt;&amp;quot;9900&amp;quot;&lt;/code&gt; instead of the integer &lt;code&gt;9900&lt;/code&gt; silently poisons the column — downstream queries comparing &lt;code&gt;amount &amp;gt; 1000&lt;/code&gt; work on new rows and misbehave on the poisoned batch, because JSON-extract returns a string and the comparison is lexicographic. The same class of mismatch a typed column would reject on INSERT.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;The planner is working blind.&lt;/strong&gt; Row-count estimates on &lt;code&gt;JSON_EXTRACT(payload, &#39;$.event.type&#39;) = &#39;upgrade&#39;&lt;/code&gt; have no histogram to consult; the planner falls back to a default selectivity estimate that&amp;rsquo;s usually wrong. Plans for queries filtered on JSON fields are routinely pessimistic or optimistic by an order of magnitude, and there&amp;rsquo;s no &lt;code&gt;ANALYZE&lt;/code&gt; to fix that because the statistics don&amp;rsquo;t exist for the interior of the blob.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Indexes are per-key, not per-column.&lt;/strong&gt; A functional index on &lt;code&gt;JSON_EXTRACT(payload, &#39;$.event.type&#39;)&lt;/code&gt; accelerates one path. The next query filters on &lt;code&gt;$.source&lt;/code&gt; and scans the table. Generated columns are the cleaner version of this — &lt;code&gt;payload_event_type VARCHAR(50) GENERATED ALWAYS AS (JSON_EXTRACT(payload, &#39;$.event.type&#39;)) STORED&lt;/code&gt; — but each one is a schema change with a backfill, and you have to know in advance which keys matter. GIN indexes on JSONB cover arbitrary keys but are large, slow to update, and still don&amp;rsquo;t tell the reader what keys exist.&lt;/p&gt;&#xA;&lt;div class=&#34;warning-box&#34;&gt;&#xA;    &lt;strong&gt;Untyped writes &amp;#43; untyped reads = silent schema drift&lt;/strong&gt;&#xA;    &lt;div&gt;A TEXT or JSON column accepts anything the writer emits and returns exactly that on read. Two services writing to the same column with slightly different shapes don&amp;rsquo;t conflict at the database level — they just produce a column whose contents depend on which service wrote the row. The divergence is invisible until a query tries to read uniformly across both.&lt;/div&gt;&#xA;&lt;/div&gt;&#xA;&#xA;&lt;h2 id=&#34;plausible-paths-empty-results&#34;&gt;Plausible paths, empty results&#xA;&lt;/h2&gt;&lt;p&gt;Schema-reading LLMs generate JSON_EXTRACT paths the same way they generate column names in a typed schema — by pattern-matching the column name and the question. Asked about &amp;ldquo;upgrade actions,&amp;rdquo; the model guesses &lt;code&gt;$.action = &#39;upgrade&#39;&lt;/code&gt; because the English-to-JSON-path mapping is obvious. It has no way to know that the key was renamed, that three generations coexist, or that the canonical name is now buried under two layers of nesting. The catalog gives it a column type of &lt;code&gt;json&lt;/code&gt; and nothing else, and the model&amp;rsquo;s best guess is reasonable and wrong.&lt;/p&gt;&#xA;&lt;p&gt;The failure pattern is familiar from other schema-hiding designs. &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/polymorphic-references-are-not-foreign-keys/&#34; &gt;Polymorphic references&lt;/a&gt; hide which table a foreign-key-shaped column points at; &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/the-bare-id-primary-key-when-every-table-joins-to-every-other-table/&#34; &gt;bare &lt;code&gt;id&lt;/code&gt; primary keys&lt;/a&gt; hide which identifier is being compared; TEXT/JSON columns hide what&amp;rsquo;s in the column at all. All three are cases where the LLM generates a plausible query against a schema that isn&amp;rsquo;t telling it enough, and the query returns plausibly-shaped but semantically empty results.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-fix-and-where-it-stops-being-free&#34;&gt;The fix, and where it stops being free&#xA;&lt;/h2&gt;&lt;p&gt;The lever isn&amp;rsquo;t &amp;ldquo;avoid JSON&amp;rdquo; — which is both impractical and sometimes wrong — it&amp;rsquo;s to be honest about what&amp;rsquo;s inside and pick the right storage per field.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Promote fields that get queried.&lt;/strong&gt; If the application filters on &lt;code&gt;event.type&lt;/code&gt; more than occasionally, that&amp;rsquo;s a real column. Generated columns are the low-friction middle path: derive a typed, indexable column from the JSON, keep the raw payload as the audit trail.&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;&#xA;&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3&#xA;&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td class=&#34;lntd&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;ALTER&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;TABLE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;api_logs&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;ADD&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;COLUMN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;event_type&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;VARCHAR&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;50&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;GENERATED&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ALWAYS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;AS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;JSON_UNQUOTE&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;JSON_EXTRACT&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;payload&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;$.event.type&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;STORED&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;ADD&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;INDEX&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;idx_event_type&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;event_type&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;The trade-off: every promoted field is a migration, and generated columns don&amp;rsquo;t retroactively rewrite rows written with a different shape — you still need the &lt;code&gt;COALESCE(JSON_EXTRACT(payload, &#39;$.event.type&#39;), JSON_EXTRACT(payload, &#39;$.action&#39;))&lt;/code&gt; cleanup for the old generations, and you&amp;rsquo;re doing that exactly once as part of the promotion rather than in every query.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Enforce new writes with a JSON schema.&lt;/strong&gt; PostgreSQL&amp;rsquo;s &lt;code&gt;pg_jsonschema&lt;/code&gt; and MySQL 8.0&amp;rsquo;s &lt;code&gt;JSON_SCHEMA_VALID&lt;/code&gt; let a CHECK constraint reject writes that don&amp;rsquo;t match a named schema. Doesn&amp;rsquo;t fix existing rows; does stop the next silent format change from landing. If the team doesn&amp;rsquo;t already have a shared event schema, a CHECK constraint is the forcing function that produces one.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Version the payload explicitly.&lt;/strong&gt; &lt;code&gt;{&amp;quot;version&amp;quot;: 3, &amp;quot;payload&amp;quot;: {...}}&lt;/code&gt; at the top lets every reader dispatch on version instead of inferring it from which keys happen to be present. Doesn&amp;rsquo;t help rows written before versioning started, but bounds the drift going forward and turns &amp;ldquo;which generation is this row?&amp;rdquo; from archaeology into a lookup.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Document what stays inside.&lt;/strong&gt; Comments on the column — &amp;ldquo;see github.com/org/events for the schema; versions 1–3 coexist in rows older than 2024-Q2&amp;rdquo; — won&amp;rsquo;t replace types, but they give the reader a place to look. &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/comment-your-schema/&#34; &gt;Comments on the schema&lt;/a&gt; are cheap, in-place, and propagate through every tool that reads the catalog; for genuinely-opaque columns this is the best available signal.&lt;/p&gt;&#xA;&lt;h2 id=&#34;when-json-is-actually-the-right-answer&#34;&gt;When JSON is actually the right answer&#xA;&lt;/h2&gt;&lt;p&gt;The pattern earns its keep in specific shapes where the alternative — typed columns — is worse.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Truly variable shape per row.&lt;/strong&gt; User-supplied settings blobs, custom-field configurations, extension points where the keys are genuinely per-tenant or per-user. Modeling each variant as a column produces a wide table full of NULLs; see &lt;a class=&#34;link&#34; href=&#34;https://explainanalyze.com/p/god-tables-150-columns-and-the-quiet-cost-of-just-add-a-column/&#34; &gt;God Tables&lt;/a&gt; for the cost of that direction. The column is honest about being schemaless because the data is schemaless.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Audit payloads nobody queries.&lt;/strong&gt; Raw API request/response bodies retained for compliance, debug traces, incident forensics. Written once, read by humans one row at a time, never aggregated. The lack of a queryable schema is fine because no query needs one. A sensible default here is to keep the payload compressed and add a small set of typed columns (&lt;code&gt;endpoint&lt;/code&gt;, &lt;code&gt;status_code&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;) for the predicates the operational queries actually use.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Short-lived staging.&lt;/strong&gt; Job queues, idempotency cache payloads, outbox entries — where the producer and consumer are deployed together, the payload is read once, and the row is deleted on completion. Drift can&amp;rsquo;t accumulate in rows that don&amp;rsquo;t stay around.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Document stores on purpose.&lt;/strong&gt; PostgreSQL JSONB with a stable schema, validated on write, with functional indexes on the paths that matter. This is a real design; it&amp;rsquo;s not the unspoken default that most TEXT columns represent. If the team is reaching for JSONB and treating it as a document store, it should look like one — with validation, indexes, and documentation — not like a TEXT column that happens to parse.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-bigger-picture&#34;&gt;The bigger picture&#xA;&lt;/h2&gt;&lt;p&gt;A TEXT or JSON column is a specific architectural choice: move part of the schema out of the catalog, in exchange for cheaper writes and looser contracts between producer and consumer. When the trade is deliberate — genuinely variable data, write-once audit, short-lived buffer — it&amp;rsquo;s the correct shape. When it&amp;rsquo;s the path of least resistance because typed columns would require a migration, the cost is deferred to every future reader who has to reconstruct the format from commit history.&lt;/p&gt;&#xA;&lt;p&gt;Databases are good at enforcing the contracts they know about. The column types are how they know. Every field that matters to a query deserves to be in the part of the schema the database can see; everything else is honestly opaque and should look it. The default drift — &amp;ldquo;stick it in the payload, we&amp;rsquo;ll parse it later&amp;rdquo; — produces columns whose contents nobody fully knows, including the team that wrote them, and the cost is paid in the form of queries that return plausible answers to questions the data can&amp;rsquo;t actually answer.&lt;/p&gt;&#xA;</description>
        </item></channel>
</rss>
