Building an AI-Driven UI Control System with Struc... · Kyle Liao

When I set out to add an AI assistant to my portfolio's mind map interface, I wanted something more than a chatbot. The goal was to create a system where the AI could actually control the UI—navigating to specific projects, expanding categories, and resetting the view based on natural language queries. This required treating the LLM as a reasoning engine that outputs structured commands, not just conversational text.

The challenge was ensuring the AI's responses were both reliable and safe. I needed a way to validate that the model's output matched a strict schema, sanitize any node IDs against the actual graph structure, and gracefully handle failures without breaking the user experience.

The Architecture

The system follows a three-layer architecture: a lightweight graph index generator that creates a compressed representation of the portfolio structure, an API route that injects this context into the LLM prompt and validates responses, and a frontend that executes only allow-listed actions.

flowchart TB
    subgraph Frontend[Frontend Layer]
        ChatOverlay[ChatOverlay Component]
        MindMap[MindMap Canvas]
        Store[Zustand Store]
    end
 
    subgraph API[API Layer]
        Route["/api/chat Route"]
        Schema[Zod Schema Validator]
        GraphIndex[Graph Index Generator]
    end
 
    subgraph External[External Services]
        Together[Together AI API]
    end
 
    ChatOverlay -->|POST request| Route
    Route --> GraphIndex
    GraphIndex -->|Inject context| Route
    Route -->|JSON mode request| Together
    Together -->|Structured response| Route
    Route --> Schema
    Schema -->|Validated| ChatOverlay
    ChatOverlay -->|Execute action| Store
    Store --> MindMap

Building the Graph Index

The first piece was creating a lightweight index of the portfolio structure. Rather than sending full MDX content or heavy metadata, I generate a minimal representation that includes just enough information for the LLM to understand the graph topology.

The generateGraphIndex function walks through projects and blogs, creating nodes with IDs, labels, types, and parent relationships. Each detail node gets a truncated summary (120 characters max) and tags, which helps the model match user queries to specific items.

export function generateGraphIndex(): GraphNode[] {
    const nodes: GraphNode[] = [
        { id: "root", label: siteProfile.name, type: "root" },
        { id: "projects", label: "Projects", type: "category", parent: "root" },
        // ... more nodes
    ];
 
    projects.forEach((project) => {
        nodes.push({
            id: `detail-projects-${project.slug}`,
            label: project.title,
            type: "detail",
            parent: "projects",
            summary: normalizeSummary(project.summary),
            tags: project.tags,
        });
    });
 
    return nodes;
}

This index is serialized to JSON and injected directly into the system prompt. At roughly 500 bytes for a typical portfolio, it's small enough to keep token costs low while providing the LLM with complete structural awareness.

Defining the Response Schema

With the context in place, I needed a strict contract for what the AI could return. Using Zod, I defined a schema that enforces three fields: a reply string for the user-facing message, an action enum limited to navigate, expand, reset, or none, and an optional targetId that must reference a valid node from the graph.

export const AIResponseSchema = z.object({
    reply: z.string().describe("Response text shown to user"),
    action: z.enum(ALLOWED_ACTIONS).default("none"),
    targetId: z.string().optional().describe("Node ID to act on"),
});

The schema includes a fallback response for when parsing fails or validation errors occur. This ensures the UI always receives a valid structure, even if the LLM returns malformed JSON or an invalid action.

Crafting the System Prompt

The system prompt is where the magic happens. It establishes the AI's role, injects the graph structure, and provides clear rules about response format and action semantics. The key constraint is that the model must output only valid JSON—no markdown, no explanatory text, just the structured response.

I use Together AI's response_format: { type: "json_object" } parameter to encourage JSON output, but the prompt itself reinforces this requirement. The prompt also includes examples of when to use each action type, helping the model make appropriate decisions.

export function buildSystemPrompt(): string {
    const graphIndex = generateGraphIndex();
 
    return `You are the AI assistant for Kyle Liao's portfolio.
You help users navigate and explore projects and blog posts.
 
## Graph Structure
${JSON.stringify(graphIndex)}
 
## Response Format
You MUST respond with valid JSON matching this schema:
{
  "reply": "Your response to the user (1-3 sentences)",
  "action": "navigate" | "expand" | "reset" | "none",
  "targetId": "node ID from graph (required if action is navigate/expand)"
}
 
## Action Rules
- "navigate": Move camera to focus on a specific node. Use for "show me X", "go to X".
- "expand": Expand a category/subcategory to show children. Use for "explore X", "what's in X".
- "reset": Return to initial view. Use for "go home", "start over".
- "none": Just reply, no UI change. Use for general questions.
 
## Constraints
- ONLY use node IDs from the graph structure above.
- If unsure which node, use action: "none".
- Keep replies concise (1-3 sentences).
- Output ONLY valid JSON, no markdown or extra text.`;
}

Validating and Sanitizing Responses

The API route handles the request flow: building the prompt, calling Together AI with JSON mode enabled, parsing the response, and validating it against the Zod schema. If parsing fails or validation errors occur, the system falls back to a safe response that displays a message but performs no UI action.

The critical security step is sanitization. Even if the LLM returns a valid schema, I verify that any targetId actually exists in the graph. If the model hallucinates a node ID or references something that doesn't exist, the action is downgraded to none and the targetId is stripped.

if (parsed.targetId && parsed.action !== "none" && parsed.action !== "reset") {
    const validIds = getValidNodeIds();
    if (!validIds.has(parsed.targetId)) {
        parsed = { ...parsed, action: "none", targetId: undefined };
    }
}

This whitelist approach ensures that only known, safe node IDs can trigger UI actions. The frontend never executes commands based on untrusted data.

Executing Actions in the Frontend

The frontend receives the validated response and displays the reply in the chat interface. If the action is not none, it sets a pendingAction in the Zustand store, which triggers a useEffect in the MindMap component.

The execution logic handles three scenarios. For reset, it calls the existing reset function. For navigate, it finds the target node and centers the camera on it. For expand, it determines whether to expand a category or subcategory and calls the appropriate layout function.

The tricky part was handling nodes that don't exist yet in the current layout. If a user asks to navigate to a detail node that hasn't been expanded, the system first expands the parent category, waits for the layout to update, then focuses on the target node once it appears.

useEffect(() => {
    if (!pendingAction) return;
 
    const { action, targetId } = pendingAction;
 
    if (action === "navigate" && targetId) {
        const targetNode = nodes.find((node) => node.id === targetId);
        if (targetNode) {
            setCenter(targetNode.position.x + 150, targetNode.position.y + 75, {
                zoom: 1,
                duration: 1000,
            });
        } else {
            // Node doesn't exist yet - expand parent first
            setPendingFocusId(targetId);
            if (targetId.startsWith("detail-projects-")) {
                expandCategory("projects");
            }
            // ... handle blog nodes similarly
        }
    }
 
    setPendingAction(null);
}, [pendingAction, nodes, expandCategory, setCenter]);

A separate effect watches for the pending focus ID and executes the camera movement once the node appears in the graph.

The Request Flow

Here's how a typical interaction works from start to finish:

sequenceDiagram
    participant User
    participant ChatOverlay
    participant API as /api/chat
    participant Together as Together AI
    participant Zod as Zod Validator
    participant MindMap
 
    User->>ChatOverlay: "Show me the Risk of Rain 2 project"
    ChatOverlay->>API: POST { prompt }
 
    Note over API: Build system prompt with<br/>graph index injection
 
    API->>Together: Chat completion (JSON mode)
    Together-->>API: { reply, action: "navigate", targetId: "detail-projects-ror2-item-browser" }
 
    API->>Zod: safeParse(response)
    Zod-->>API: ✓ Validated
 
    API->>API: Sanitize targetId against valid IDs
    API-->>ChatOverlay: { reply, action: "navigate", targetId: "detail-projects-ror2-item-browser" }
 
    ChatOverlay->>ChatOverlay: Display reply message
    ChatOverlay->>MindMap: setPendingAction
    MindMap->>MindMap: Expand projects category
    MindMap->>MindMap: Wait for node to appear
    MindMap->>MindMap: Center camera on target node

Deep Dive Mode

For the "Ask AI" button on detail nodes, I wanted a richer analysis that includes the full project or blog metadata. The buildDeepDivePrompt function creates a specialized prompt that includes the title, summary, and tags, asking the model to explain technical challenges and key decisions.

This prompt is sent as the user message, while the system prompt still contains the graph structure. The model can reference the full context while providing a detailed analysis of the specific item.

Lessons Learned

The biggest challenge was ensuring reliability. LLMs are probabilistic, and even with JSON mode enabled, responses can occasionally be malformed or reference non-existent nodes. The validation and sanitization layers are essential—they transform the model from a potential source of errors into a reliable reasoning engine.

The lightweight graph index approach worked well. By keeping context small (around 500 bytes), I can use faster, cheaper models while still providing complete structural awareness. The model rarely makes mistakes about which nodes exist, and when it does, the sanitization layer catches it.

Another insight was the importance of graceful degradation. When validation fails, the system doesn't break—it just shows a message and performs no action. Users get feedback, but the UI remains stable.

Future Improvements

There are a few directions I'd like to explore. Multi-step operations could be useful—for example, "show me Unity projects" might need to expand the projects category, then filter to Unity-related items. This would require a plan array in the response schema that describes a sequence of actions.

I'm also considering adding retry logic for malformed JSON responses. While Together AI's JSON mode is quite reliable, a single retry with a "fix this JSON" prompt could improve success rates without adding much complexity.

The current implementation uses Together AI exclusively, but the architecture is provider-agnostic. The prompt structure and validation logic would work with OpenAI, Anthropic, or any other provider that supports structured outputs.

Conclusion

Building this system taught me that LLMs can be reliable UI controllers when you treat them as reasoning engines with strict output contracts. The key is combining context injection, schema validation, and action sanitization to create a safe, predictable interface between natural language and UI state.

The result is an assistant that feels intelligent but operates within well-defined boundaries. Users can ask questions naturally, and the system translates those queries into precise UI commands while maintaining safety and reliability.