Agentic Design Is Control Surface Design
Most AI interface work still treats the agent as a smarter text box. Agentic design starts from a different premise: once software can act over time, call tools, touch files, spend money, or change production state, the design problem becomes a control-surface problem.
Agentic design is the discipline of making autonomous software visible, interruptible, inspectable, reversible, and worthy of trust. The product is not the chat transcript. The product is the surface that lets a human understand what the agent is doing, decide what the agent may do next, and verify what the agent already did.
That frame matters because agents do not fail like ordinary forms, dashboards, or copilots. A form fails at submit time. A dashboard fails by showing stale data. A copilot fails by suggesting bad text. An agent fails through motion: it takes the wrong branch, chooses the wrong tool, misses the right evidence, loses context, overuses permissions, stops too early, or succeeds locally while weakening the whole product.
Design has to move from prompt polish to operational control.
TL;DR
Agentic design is not “UX for AI” in the abstract. It is the design of control surfaces for systems that act. Microsoft framed human-AI interaction as a distinct interface-design problem years before today’s coding agents, and Google PAIR keeps the same human-centered thread in its AI design guidance.12 Modern agent products make the need sharper: OpenAI describes Codex as a cloud agent that works in an isolated environment, while Claude Code exposes hooks that can intercept tool calls before execution.54
The practical takeaway: agent products need surfaces for status, permissions, traces, memory, evidence, rollback, and supervision. Chat can remain an input. It cannot remain the whole interface.
Key Takeaways
For product designers: - Design agent state before designing prompt input. The user needs to know whether the agent is planning, acting, blocked, waiting, verifying, or done. - Treat permission review as a primary workflow. A risky tool call should not look like a casual chat interruption.
For agent builders: - Log enough execution detail to power a trace surface. Tool names alone are not enough; the surface needs arguments, outputs, exit states, file paths, and side effects. - Make interruption and recovery first-class. A user should be able to pause, inspect, redirect, roll back, or fork an agent without reading a full transcript.
For teams adopting agents: - Do not measure interface quality by how fluent the chat feels. Measure whether the operator can answer: what happened, why, with what permission, and what evidence? - Keep taste in the loop. A correct agent action can still damage coherence, dignity, or product quality.
The User Changed
The user of an agent product is not only a prompter. The user becomes an operator.
A prompter asks for an answer. An operator supervises a process. A prompter cares whether the text sounds right. An operator cares whether the system touched the right files, used the right sources, preserved the right constraints, and stopped at the right time.
That difference changes the interface. Prompt boxes optimize for expression. Control surfaces optimize for state, risk, timing, and proof.
Traditional software can hide process because the user directly triggers most state changes. A button says “Send.” The user clicks. The app sends. Agent software inserts a decision-making runtime between intent and action. The user asks for an outcome, and the system chooses a path. The interface must reveal enough of that path for the user to stay responsible for the result.
Microsoft’s human-AI interaction guidelines point in that direction. The guidelines cover the behavior of AI systems across interaction time: setting expectations, matching social context, showing status, supporting correction, and handling failures.1 The old lesson applies cleanly to agents, but agents raise the stakes because AI behavior no longer ends at a recommendation. The behavior can become a tool call.
Agentic Design Starts With State
Good agentic design makes state visible before it asks for trust.
An agent has more states than “thinking” and “done”:
| Agent State | What The User Needs |
|---|---|
| Planning | Intended path, assumptions, likely tools |
| Searching | Query terms, sources, misses, next query |
| Acting | Tool call, arguments, target, expected side effect |
| Blocked | Missing permission, missing credential, unclear requirement |
| Verifying | Test command, evidence source, acceptance criterion |
| Recovering | Failed step, retry path, changed assumption |
| Done | Artifact, evidence, unresolved gap |
Most chat products collapse these states into one animated spinner. A spinner says the system has not stopped. It does not say whether the agent is reading, writing, waiting, retrying, or stuck.
Agentic state needs a richer vocabulary. The surface should show the current phase, the last meaningful action, the next intended action, and the reason the agent has not finished. A good status surface reduces user anxiety because it replaces mystery with inspectable motion.
The hard design question is density. A serious agent can generate thousands of events during a long run. Showing every event creates noise. Hiding every event creates blind trust. The control surface has to summarize by default and expand on demand.
Permission Is A Design Material
Permission is not a settings page. Permission is one of the central materials of agentic design.
Agents act through authority the user grants. File writes, shell commands, browser actions, API calls, deploy steps, payment operations, and customer-impacting actions all carry different risk. The interface has to make that risk legible at decision time.
Claude Code’s hook reference shows the primitive form of this idea: a PreToolUse hook can inspect a Bash command and return a decision that denies a destructive operation before the tool call runs.4 That mechanism proves the design shape. A control surface can sort pending operations by risk, show the full command or tool payload, explain the reason for the call, and let the user approve, deny, defer, or rewrite the request.
The key shift: permission review should become a queue, not an interruption.
Interruptions work for one or two decisions. They fail when the agent performs 40 operations across a long task. A permission queue lets the user batch low-risk approvals, pause high-risk actions, and review the whole risk profile in one place. The user stops being yanked between reading the agent’s prose and evaluating commands.
Risk presentation also needs taste. Red borders, warning icons, and modal friction can help. They can also train the user to approve alerts blindly when everything looks urgent. The interface should reserve visual alarm for irreversible or externally visible actions. Read-only search should not wear the same costume as a production database migration.
Trace Is The New Information Architecture
Agentic design needs trace architecture.
A trace is the ordered record of what the agent did: prompts, tool calls, arguments, files read, files changed, commands run, sources opened, test outputs, permission decisions, retries, and final evidence. A chat transcript can contain parts of that record, but a transcript is not an information architecture. It is a scroll.
The trace surface should answer four questions quickly:
| Question | Trace Surface Requirement |
|---|---|
| What happened? | Timeline with filters by event type |
| Why did it happen? | Agent-stated reason attached to each action |
| What changed? | Diffs, artifacts, side effects, and touched paths |
| What supports the result? | Evidence links, command outputs, citations, and unresolved gaps |
That surface connects directly to the evidence gate. A final answer that says “tests passed” should point to the test command and exit status. A public article that cites a paper should point to the exact source and claim alignment. A migration report that claims parity should point to the specific user path that still works.
The recent execution-trace research points the same way. I argued in Agent Execution Traces Are the Runtime Contract that the final answer is the weakest unit to trust. The trace is stronger because it preserves the path from intent to action to evidence.
Memory Needs A Browser
Agentic design also needs memory design.
Agents carry context across time. Some context sits in the active window. Some sits in compacted summaries. Some sits in files, notes, vector stores, databases, or project instructions. Some disappears. The user rarely sees the boundary.
That invisibility creates a design failure. When an agent contradicts an earlier decision, the user cannot tell whether the agent disagreed, forgot, summarized poorly, or never loaded the relevant memory. Chat makes memory feel continuous even when the runtime has changed what the model can see.
A memory browser should expose three layers:
| Memory Layer | User Question |
|---|---|
| Active context | What can the agent use right now? |
| Stored memory | What can the agent retrieve if needed? |
| Compacted or stale memory | What did the system compress, omit, or mark uncertain? |
That browser does not need to reveal private chain-of-thought. It needs to reveal operational memory: instructions, constraints, source paths, decisions, artifacts, and summaries the system will use to guide future action.
Search belongs in the same design family. The grep/vector result from the previous article showed that search quality depends on runtime, delivery path, and the model’s ability to close the tool loop, not only on the retriever.6 If search lives in the runtime, search visibility belongs in the interface. The user needs to know what the agent searched, what it missed, what it opened, and why the next query changed.
Supervision Is Not Micromanagement
Agent products often frame human oversight as friction. Strong agentic design treats supervision as the product.
NIST describes the AI Risk Management Framework as a way to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI systems.3 That wording matters. Trustworthiness does not enter only at model training time. It enters at design time, use time, and evaluation time.
For agents, supervision means the user can:
- see what the agent is doing;
- interrupt before irreversible action;
- inspect the evidence path;
- recover from a failed branch;
- compare alternate branches;
- approve or reject the final artifact;
- understand what remains unverified.
Micromanagement asks the user to approve every keystroke. Supervision gives the user the right control at the right altitude. A senior engineer does not need to watch every file read. That engineer does need to see a proposed database migration, a failed test retry, a changed public claim, or a command that touches production state.
Good supervision surfaces preserve flow by moving low-risk details out of the main lane and pulling high-risk moments into focus. The design challenge is not “more visibility.” The design challenge is calibrated visibility.
The Taste Layer Still Matters
Agentic design can satisfy every operational requirement and still feel wrong.
A permission queue can expose the right facts while making the user feel punished. A trace timeline can contain every event while making comprehension impossible. A memory browser can show every stored item while destroying the user’s confidence through clutter. A status meter can tell the truth while making the system feel broken.
Taste decides how the surface carries risk, confidence, uncertainty, and proof. Taste is a technical system: constraints, evaluation criteria, pattern recognition, and coherence. Agentic design needs all four.
Constraints decide what the agent may do without review. Evaluation criteria decide what the final artifact must prove. Pattern recognition catches the workflow that looks successful but feels brittle. Coherence asks whether the agent’s work improved the whole product or only completed the local task.
That last question matters more as agents get cheaper. AI makes output abundant. Abundance raises the value of refusal, editing, coherence, and taste. The best agentic interface will not maximize actions. It will help the operator decide which actions deserve to happen.
A Minimal Agentic Design Checklist
Start with seven surfaces:
| Surface | Minimum Requirement |
|---|---|
| Status | Current phase, last action, next action, blocker |
| Permission | Risk-tiered queue with full tool payload |
| Trace | Filterable timeline with arguments, outputs, and side effects |
| Evidence | Claims mapped to source, command, test, or unresolved gap |
| Memory | Active context, stored context, compacted summaries |
| Recovery | Pause, resume, retry, rollback, fork, and cancel |
| Supervision | Cross-agent view of blocked, risky, and completed work |
None of those surfaces require a science-fiction interface. The first version can be plain tables, expandable rows, and boring filters. Fancy animation matters less than honest state. The control surface should tell the truth quickly.
The design question for every agent feature becomes simple:
What does the human need to see, decide, interrupt, or verify before the agent’s next action becomes real?
If the interface cannot answer that question, the product still relies on trust theater.
Quick Summary
Agentic design is control-surface design. Chat remains useful as an input primitive, but autonomous work needs visible state, permission queues, traces, memory browsers, evidence surfaces, recovery controls, and supervision views. Microsoft, Google, and NIST all point toward human-centered AI design and trustworthiness as product responsibilities, not only model properties.123 Agent tools make the point concrete: the runtime already has hooks, containers, traces, files, commands, and side effects.45 The interface has to make those parts legible.
The winning agent product will not be the one with the most charming chat. The winning product will be the one that gives operators the clearest, sharpest, most trustworthy surface for autonomous work.
FAQ
Is agentic design different from AI UX?
Yes. AI UX covers any experience that uses machine learning or generative AI. Agentic design covers systems that act over time. The difference is agency: tool calls, permissions, state changes, memory, side effects, and recovery. Those properties require control surfaces, not only helpful copy or prompt input.
Does every agent product need all seven surfaces?
No. The surface area should match risk. A low-stakes writing assistant may need status, evidence, and revision history. A coding or operations agent needs permission, trace, recovery, memory, and supervision. A customer-impacting agent needs even stronger audit and approval controls.
Why not keep everything in chat?
Chat is sequential and append-only. Agent supervision needs random access, filtering, comparison, batch review, and state inspection. Collapsible chat blocks can improve readability, but they cannot replace a permission queue, trace timeline, memory browser, or recovery surface.
What is the first control surface to build?
Build the trace first. Without the trace, every other surface becomes guesswork. The trace supplies the data for evidence, permissions, recovery, audit, and supervision. A product can start with a plain event table and improve the design over time.
References
-
Saleema Amershi et al., “Guidelines for Human-AI Interaction,” Microsoft Research, CHI 2019. Primary source for the 18 human-AI interaction guidelines, validation process with 49 design practitioners, and framing of AI behavior as an interface-design problem. ↩↩↩
-
Google People + AI Research, “People + AI Guidebook,” and “People + AI Research,” Google Design. Source for the human-centered AI design framing and tactical guidebook orientation. ↩↩
-
National Institute of Standards and Technology, “AI Risk Management Framework,” NIST, January 26, 2023, with later generative AI profile updates. Source for incorporating trustworthiness into the design, development, use, and evaluation of AI products, services, and systems. ↩↩
-
Anthropic, “Hooks reference,” Claude Code Docs. Source for hook lifecycle,
PreToolUse, matcher behavior, and permission decisions that can deny tool calls before execution. ↩↩↩ -
OpenAI, “Introducing Codex,” OpenAI, May 2025. Source for Codex’s cloud execution model, isolated container description, and background software-engineering task framing. ↩↩
-
Blake Crosley, “Agent Search Is a Runtime Problem,” blakecrosley.com, May 15, 2026. Source for the author analysis connecting search quality to runtime, result delivery, and tool-loop behavior. ↩