How we extract a real design system from a Bubble export
A .bubbleexport is a single JSON blob — 5-25 MB of element definitions, page trees, workflows, and a startling amount of styling metadata. When we built the analysis pipeline, the chunker's first job was to strip that styling out: width/height, padding, border-radius, hex codes, font names. Pure noise from the AI's point of view — it just wants to know what the app does.
Six months in, we realised we were throwing away the most valuable thing in the file. Because that "noise" is literally the customer's design system, in fully structured form. CSS-variable tokens like --color_primary_default, typed responsive breakpoints, a finite type scale, the dominant body font. We were extracting it implicitly via AI vision later. We could just read it directly.
What lives in the export
On a 6 MB real-world export we tested against, we found:
- 99 distinct colors — 9 of them as CSS-variable design tokens (
color_primary_default,color_surface_default, etc.), the rest as literals embedded across 477 styling references. - 7 font families— Roboto dominant (220×), then Inter (48×), DM Sans (5×), plus four near-orphan picks that read as "designer churn" in a previous iteration.
- 20 distinct font sizes from 9 to 124 px, with 16 px the clear body-text default (212×).
- A clean 4 / 5 / 8 / 10 / 12 / 16 / 20 / 24 / 32 / 48 px spacing scale — classic 4 px base.
- 13 breakpoints, with 900 px the workhorse (used 13 different element rules).
- An element census: 2,266
Group, 1,768Text, 345Button, 291Icon, 259Image, 195RepeatingGroup, etc.
How the extractor works
Pure TypeScript. No AI. The whole thing is a single walker over the parsed JSON tree, plus a few normalisation helpers. The shape is in lib/ai/design-system.ts:
const STYLING_KEYS = new Set([
"bgcolor", "color", "font_color", "text_color", "border_color",
])
class Collector {
walk(node: unknown): void {
// recurse arrays + objects
// collect: colors (via parseColor),
// fonts (font_family),
// sizes (font_size 0-400),
// weights (100-900 / normal / bold),
// spacing (column_gap, row_gap, margin_*),
// breakpoints (Message.less_than args),
// element types (UI_ELEMENT_TYPES whitelist)
}
finalise(): DesignSystem { … }
}The interesting bit is colour normalisation. Bubble writes colours in three shapes that all need to land in the same histogram:
rgba(255, 255, 255, 1)— literal RGB.rgba(var(--color_primary_default_rgb), 0.16)— design-token reference.#7263D5/#7263D5CC— hex with optional alpha.
The token form is the most interesting because it lets us infer the customer's role taxonomy: which colour is "primary", which is "text", which is "destructive". A regex pulls the token name out of the var(--…) call, then a simple prefix match (color_primary_*, color_text_*, etc.) maps to a role enum.
Why this matters for the rebuild
With the snapshot in hand we can do two things the customer instantly notices:
1. Render a real "your design system" exhibit in the report. Not stock screenshots — their actual palette swatches, font families with live preview, spacing chips, breakpoint chips, element-type histogram. The signal is "we read your file, we know your stack."
2. Emit a tailwind.config.js seeded with their values. Drops straight into a fresh Next.js project so the rebuild starts visually aligned with what their designer already shipped. No "please send us your style guide" awkwardness on the kickoff call.
And — because the extractor is pure TS, not AI — it's deterministic, free, and adds zero tokens to the existing Anthropic spend. The full extraction on a 6 MB export takes ~80 ms inside the Inngest worker and outputs a 6 KB JSON snapshot we store on the analysis row.
What’s next
The same JSON has a per-page element tree we're not using yet. Next iteration: feed those trees plus real screenshots of the customer's published Bubble app to Opus 4.7 with vision attached, and generate a near-mirrorreconstruction of one screen on real code. Then deploy it to Vercel from inside the same Inngest pipeline. The report stops being an estimate and starts being "here's your app on production code, click around."
(That's already live, actually. Try the free analysis.)