From Templates to Component Vocabulary: Letting the AI Decide UI Structure

I'm not a professional developer. I run enterprise sales. But I've been coding as a hobby for decades, and AI tools made it possible to build things I couldn't have built alone.

My first approach used rigid templates. The AI filled in predefined sections. It worked, but the template decided what mattered, not the AI. A critical deal slippage got the same visual weight as a routine observation.

I rebuilt it using a component vocabulary approach. Instead of templates, I give the AI a set of typed components and let it compose them based on what the data demands.

Context: I built a tool that reconciles data from multiple sources—CRM exports, manual inputs, and other business systems. The AI synthesizes this into guidance. (The tool runs locally, and all data is anonymized before hitting any LLM API—I'm not playing fast and loose with company data.)

This post covers what I built, what broke, and what I learned.

The Pattern

The AI receives a vocabulary of components and outputs a composition:

{
  "type": "Accordion",
  "props": { "defaultOpen": ["critical-section"] },
  "children": [
    {
      "type": "AccordionSection",
      "props": { "title": "3 Items Need Attention", "variant": "danger" },
      "children": [
        { "type": "RecordCard", "props": { "status": "blocked" } }
      ]
    }
  ]
}

The frontend renders it through a recursive component registry:

function ComponentRenderer({ component, data }) {
  const Component = Registry[component.type];
  return (
    <Component {...component.props}>
      {component.children?.map(child =>
        <ComponentRenderer component={child} data={data} />
      )}
    </Component>
  );
}

The AI decides structure, emphasis, and ordering. I control styling and behavior.

What I Built

Layout components: Accordion, Grid, Collapsible. The AI chooses structure and what's expanded by default.

Emphasis via variants: Cards and text have variants (danger, warning, success). The AI picks based on severity.

Domain components: RecordCard shows entity details with AI-assessed status. VerifiedMetric displays numbers with backend trust verification (more on this below).

Embedded charts: The AI specifies chart type and embeds data directly in the schema. No extra API calls—the schema is self-contained.

The Trust Problem: VerifiedMetric

When AI generates financial figures, users doubt them. "Is that number real, or did the model make it up?" Fair question. I had the same reaction to my own tool.

I built VerifiedMetric to address this. The AI states a number; the backend validates it against the database before the UI renders:

{
  "type": "VerifiedMetric",
  "props": {
    "label": "Total Revenue YTD",
    "value": 1234567,
    "metricType": "revenue_ytd",
    "period": "2026"
  }
}

When the schema is processed, the backend looks up the actual value. If the AI's number is within 1% tolerance, the component renders with a green "Verified" badge. If not, a variance warning shows the database value alongside the AI's claim:

const VerifiedMetricComponent: React.FC<VerifiedMetricProps> = ({
  label, value, validation,
}) => {
  const isVerified = validation?.verified ?? false;

  return (
    <div className="verified-metric">
      <span className="verified-value">{formatValue(value)}</span>
      {validation && (
        <span className={isVerified ? 'badge-verified' : 'badge-warning'}>
          {isVerified ? '✓ Verified' : `⚠ ${validation.variancePct.toFixed(1)}% variance`}
        </span>
      )}
      {validation && !validation.verified && (
        <div className="verified-note">DB value: {formatValue(validation.dbValue)}</div>
      )}
    </div>
  );
};

The pattern is transferable well beyond my use case. Any system where AI states facts that can be checked against a source of truth could work this way: AI generates freely, backend validates, UI shows trust signals. It's a practical answer to the hallucination trust gap that doesn't require constraining what the AI can say—just verifying what it claims.

What Broke

1. Schema replacement destroyed response quality

I had templates stored in the database. A refactor replaced them at runtime with minimal file-based schemas. Result: responses were 50% smaller with critical fields missing. Classic hobby-coder mistake—I was "improving" something that worked.

Fix: Database templates became the single source of truth. No runtime replacement.

2. The AI invented syntax the renderer couldn't parse

The AI generated data paths like "records[*].items[*].value". Wildcard notation that looked reasonable but lodash.get() doesn't support.

Fix: Explicit documentation in the prompt with WRONG vs CORRECT examples:

WRONG:  "records[*].field"
CORRECT: "records"  (renderer iterates automatically)

3. Pre-defined risks killed insight quality

Early templates had variables like {{TOP_RISK_1}}, {{TOP_RISK_2}}. The AI filled slots instead of analyzing.

Fix: Removed pre-defined slots. Added instructions: "Analyze the data and identify risks. There may be zero, one, or many."

When responses were wrong, I had no way to see what prompt produced them.

Fix: Store the full prompt with every response. Essential for investigating discrepancies.

Trade-offs

Debugging is harder. With templates, you inspect a fixed structure. With AI composition, every response differs. I added JSON schema validation and extensive logging.

Testing is harder. You can't assert exact output. I test component rendering in isolation and validate schema structure, not specific compositions.

Prompt engineering becomes critical. Template changes are simple text edits. Vocabulary changes require careful prompt updates, examples, and testing across scenarios.

Model quality affects UI quality. Different models produce different composition styles. Model upgrades can change output structure.

Bad editorial choices happen. Sometimes the AI buries important information. I don't have user override yet; it's on the roadmap.

Practical Notes

Prompt templates live in the database, versioned with is_active flags. I create new versions rather than editing in place. Rollback is one flag flip.

Schema validation happens twice: Pydantic on the backend, TypeScript on the frontend. Unknown component types fall back to a JSON viewer.

Security comes from the vocabulary constraint. The AI can't inject scripts or styles because those aren't in the component set. I validate prop types and enforce depth limits.

Store everything. Full prompts, raw responses, parsed schemas. When something breaks, you need full context.

Performance: Generation takes 3-8 seconds depending on data complexity. Typical prompts run 3,000-5,000 tokens. I've generated several hundred responses over two months of personal use.

What I'd Do Differently

Start with fewer components. I built 15+ component types before knowing which ones mattered. Could have started with 5 and added based on actual needs.
Build transformation layers first. My existing chart components expected different prop formats than the AI generated. The mismatch cost days. Abstract the interface early.
Add usage tracking from day one. This is a personal tool, so I don't have systematic data on what works best. If I were building for others, I'd instrument preference tracking immediately.
Version prompts more aggressively. I went from v1.0 to v1.9 in weeks. Each version should have been a tracked experiment.

The pattern is straightforward: give the AI a vocabulary, let it compose, render faithfully. The complexity is in the details: prompt engineering, transformation layers, debugging infrastructure.

Build the infrastructure for iteration. You'll need it.