PROJECTS

Bhunductor : When Your AI Agent Outgrows the Terminal

February 23, 202618 MIN READ

The Problem

Claude Code is one of the best developer tools available today. You point it at a codebase, tell it what you want, and it reads files, writes code, runs tests, and iterates -- all autonomously. But it runs in a terminal. And that's where the friction starts.

The Terminal Problem

If you've used Claude Code for any real work, you know the workflow:

Open a terminal tab for Claude CLI
Open another tab (or your editor) to see what Claude is changing
Open a third for running your own commands
Try to keep track of which branch you're on
Lose the session when your terminal closes
Start over

The terminal is great for single-purpose tools. But Claude Code is not a single-purpose tool -- it's an agent that reads, writes, executes, and asks for permission. That's a lot of state to manage in a stream of text.

The specific pain points:

No visual structure. Tool calls, thinking blocks, text responses, and permission prompts all render as the same monospaced text. When Claude is on a 15-tool-call loop editing files, running tests, and iterating -- it's hard to follow. You're scrolling through walls of JSON-like output trying to find where the actual answer starts.

No persistence. Close the tab, lose the session. Yes, you can --resume, but there's no overview of past sessions. No way to name them, search them, or switch between them. If you had three separate conversations about three separate bugs, good luck finding the second one next week.

No branch awareness. Claude operates in whatever directory you point it at. If you're working on a feature branch, switch to main to fix a hotfix, then switch back -- Claude doesn't track any of this. Your sessions are detached from your git workflow.

Context-switching tax. To verify what Claude changed, you leave the terminal, open your editor, check the diff, come back. To run a quick git log, you need another terminal. The more tools you use, the more windows you manage.

Why Not Claude Desktop / Conductor?

Claude Desktop is optimized for general conversation -- it's not built around codebases. There's no concept of "open this repository" or "work on this branch." It's a chat window, not a development environment.

As for Conductor (the Anthropic internal tool), it's not publicly available, and even from what's been shared, it takes a different approach -- closer to a notebook/workflow interface. We wanted something closer to an IDE shell: a workspace anchored to a git repo where Claude is a first-class collaborator, not a separate tool you alt-tab to.

The question was simple: what would it look like if Claude Code had a proper UI?

Not a web wrapper. Not a chat widget bolted onto an editor. A native desktop app where the repository is the center of the experience, branches are workspaces, and Claude sessions are persistent, resumable, and visual.

The Goal

Build a native desktop app that:

Wraps Claude Code with a real UI -- structured rendering for tool calls, thinking blocks, text responses, and permission prompts
Persists sessions across app restarts -- named, searchable, resumable, with full message history
Uses git worktrees as workspaces -- each branch is a separate directory, no stashing, no checkout conflicts
Includes an integrated terminal -- a real PTY shell in the same window, not a fake one
Runs entirely local -- no server, no cloud, your code never leaves your machine

The Architecture

Choosing the Stack

The constraints dictated the stack:

Must integrate with Claude CLI -- we need access to the same tool execution, MCP support, and streaming that the CLI provides
Must manage git worktrees -- native filesystem access, child_process for git commands
Must run locally -- no server, no cloud, your code stays on your machine
Must feel native -- not a browser tab disguised as an app

That points to Electron. Yes, it's heavy. But it gives us Node.js for subprocess management, native filesystem access, and a real window with proper OS integration. On macOS, we get native traffic light controls, menu bar integration, and the app lives in your dock like any other tool.

The full stack:

Loading diagram...

React 18 for the UI -- no router needed (it's just two pages: Home and Dashboard), state managed with Zustand stores.

better-sqlite3 for persistence -- synchronous, no ORM, just raw SQL. It's fast enough for this use case and eliminates async complexity in the data layer.

node-pty + xterm.js for the terminal -- a real PTY shell, not a fake one. Same TERM=xterm-256color as your regular terminal.

Monaco Editor for file viewing -- the same engine behind VS Code, embedded as a read-only viewer for browsing repo files.

The IPC Layer

Electron's biggest footgun is the IPC bridge between main and renderer processes. Do it wrong and you get security holes or spaghetti event handlers.

We chose a strict pattern:

All channels defined in one file -- shared/constants.js exports every channel name. ~49 channels, namespaced as {domain}:{action}.
Allowlisted in preload -- the context bridge only exposes channels that exist in the constants file. The renderer cannot invoke arbitrary IPC calls.
Consistent response format -- every handler returns { success: true, ...data } or { success: false, error: string }. No exceptions.

text

claude:session-start      → Create a new session
claude:send-message       → Send user message to Claude
claude:permission-respond  → User approves/denies a tool
claude:turn-complete      → Turn finished (cost, usage)
worktree:create           → Create a new git worktree
worktree:list             → List all worktrees
file:tree-get             → Get file tree for a directory
terminal:create           → Spawn a PTY shell

The renderer never touches Node.js APIs directly. Everything goes through window.electron.invoke() and window.electron.on().

The Claude Integration: A Story in Two Acts

Act 1: The Subprocess Approach (What We Built First)

The first version spawned Claude CLI as a long-lived child process:

Loading diagram...

Each session created one CLI process that lived for the entire conversation. User messages were written to stdin as JSON. Claude's responses came back on stdout as NDJSON -- one JSON object per line, streaming in real-time.

This worked. Mostly. But it introduced a cascade of problems:

Problem 1: Permission Flow Required an HTTP Server

Claude CLI uses MCP (Model Context Protocol) for tool permissions. In the terminal, this happens interactively. In our app, we needed to intercept permission requests and show them in the UI.

The solution was... elaborate:

Loading diagram...

That's five process boundaries for a single permission check. We had to:

Spawn a separate Node.js MCP subprocess per session
Run a Fastify HTTP server in the main process on a dynamic port
Pass the port via environment variables to the MCP subprocess
Write temporary MCP config files to /tmp/ for each session
Clean them up on exit

It worked, but it was fragile. Ports could conflict. Temp files could linger. The HTTP server was another thing that could crash.

Problem 2: stdin Backpressure

Node.js child_process stdin is a writable stream with backpressure. If you write faster than the process can consume, the stream buffers and eventually applies backpressure. We had to implement a queue:

javascript

sendMessage(message) {
  const payload = JSON.stringify(messageObj) + '\n';
 
  if (this.stdinDraining) {
    this.stdinQueue.push(payload);
    return;
  }
 
  const ok = this.process.stdin.write(payload);
  if (!ok) {
    this.stdinDraining = true;
    this.process.stdin.once('drain', () => {
      this.stdinDraining = false;
      this.flushStdinQueue();
    });
  }
}

Not hard, but another edge case to maintain.

Problem 3: Replay Detection Was a Heuristic

When resuming a session, the CLI replays the entire conversation history before accepting new input. We needed to detect when replay ended and new content began.

There's no explicit "replay complete" event. Our solution: a 5-second timeout after the last replayed message.

javascript

const REPLAY_SAFETY_TIMEOUT_MS = 5000;
 
// In the result handler during replay:
this.replayTimer = setTimeout(() => this.finalizeReplay(), REPLAY_SAFETY_TIMEOUT_MS);

If no new events arrived within 5 seconds of the last replayed result event, we assumed replay was done. This was fragile -- slow networks or large histories could trip it.

Problem 4: Process Lifecycle Management

Stopping a session meant killing a process tree (CLI + MCP subprocess). We used process.kill(-pid, 'SIGTERM') for the process group, with a SIGKILL fallback after 3 seconds:

javascript

stop() {
  try {
    process.kill(-pid, 'SIGTERM');
  } catch {
    try { proc.kill('SIGTERM'); } catch { }
  }
 
  setTimeout(() => {
    try { process.kill(-pid, 'SIGKILL'); } catch { }
  }, 3000);
}

And the MCP config cleanup:

javascript

if (this.mcpConfigPath) {
  try { fs.unlinkSync(this.mcpConfigPath); } catch { }
}

It all worked, but every session was managing: a CLI process, an MCP subprocess, an HTTP server connection, temp files, stdin backpressure, and a replay heuristic.

Act 2: The SDK Migration (What We Have Now)

Then the Claude Agent SDK shipped.

The SDK wraps the CLI internally and exposes a clean async generator interface. Instead of managing a long-lived process, each user message becomes a single query() call:

javascript

const conversation = sdk.query({ prompt: userMessage, options });
 
for await (const message of conversation) {
  handleMessage(message);
}

The SDK handles the full agentic loop internally -- tool calls, tool results, multi-turn reasoning -- and yields events as they happen. When the loop completes, the generator finishes.

This eliminated:

| Before (Subprocess) | After (SDK) | |---------------------|-------------| | Long-lived CLI process | No process to manage | | stdin/stdout pipes + backpressure | query() call + async generator | | MCP subprocess per session | In-process MCP server | | HTTP permission server (Fastify) | canUseTool callback | | Temp MCP config files | SDK handles config internally | | 5-second replay timeout heuristic | isResuming flag + first stream_event | | SIGTERM/SIGKILL process cleanup | AbortController.abort() |

The permission flow went from five process boundaries to one callback:

javascript

canUseTool: async (toolName, input, { signal }) => {
  // Auto-approve hidden tools
  if (isHiddenTool(toolName)) {
    return { behavior: 'allow', updatedInput: input };
  }
  // Promise-based: IPC to renderer, await user response
  return awaitUserPermission(sessionId, toolName, input, signal);
}

Session resume went from a timeout heuristic to a flag:

javascript

// SDK replays old messages when resume: is set.
// Skip them - renderer already has messages from the database.
if (this.isResuming) {
  if (message.type === 'stream_event') {
    this.isResuming = false;  // First stream_event = new content
    this._handleStreamEvent(message);
  }
  return;  // Skip replayed messages
}

The ESM Catch

One real gotcha: the Claude Agent SDK is ESM-only. Electron's main process is CommonJS. You can't require() an ESM module.

The workaround is a lazy dynamic import:

javascript

let _sdk = null;
async function loadSDK() {
  if (!_sdk) {
    _sdk = await import('@anthropic-ai/claude-agent-sdk');
  }
  return _sdk;
}

This runs once per session on the first query() call. It's cached after that. Simple, but the kind of thing that burns an hour if you don't know about it.

Keeping the Legacy Code

We kept the entire subprocess implementation behind a feature flag:

javascript

const USE_SDK = true;
 
if (this.useSDK) {
  session = this._createSDKSession(sessionId, workingDir, options);
} else {
  session = this._spawnProcess(sessionId, workingDir, false, options);
}

The renderer doesn't know or care which path is active. Both emit the same IPC events. This was important -- we could ship the SDK path while having a rollback if something broke in production.

The Streaming Pipeline

Whether using the SDK or the legacy subprocess, the stream events follow the same structure (the SDK wraps the CLI's NDJSON format in its own message types):

Loading diagram...

The Deduplication Problem

The Claude API (and by extension the CLI/SDK) emits tool use data in two places:

During streaming -- content_block_start announces the tool, content_block_delta streams the JSON input incrementally, content_block_stop finalizes it
In the complete message -- the assistant event contains the full message with all content blocks, including the same tool_use blocks

If you naively render both, every tool call shows up twice in the UI.

The fix is a forwardedToolUseIds Set:

javascript

// During streaming (content_block_stop):
this.forwardedToolUseIds.add(toolBlock.id);
this.callbacks.onToolUse(toolData);
 
// In assistant message (fallback):
if (!this.hasStreamedContent) {
  for (const block of contentArray) {
    if (block.type === 'tool_use') {
      if (this.forwardedToolUseIds.has(block.id)) continue;  // Already sent
      this.callbacks.onToolUse(toolData);
    }
  }
}

The hasStreamedContent flag is the first line of defense (if we got any streaming events, skip the assistant fallback entirely). The Set is the second (if somehow both paths fire, deduplicate by ID).

Tool Input Reconstruction

Tool inputs arrive as incremental JSON fragments via input_json_delta:

json

{"partial_json": "{\"file_path"}
{"partial_json": "\": \"/src/ind"}
{"partial_json": "ex.js\"}"}

We buffer these into a string and parse at content_block_stop:

javascript

case 'content_block_delta':
  if (event.delta?.type === 'input_json_delta') {
    this.currentContentBlock.inputJson += event.delta.partial_json;
  }
  break;
 
case 'content_block_stop':
  let parsedInput = null;
  try { parsedInput = JSON.parse(this.currentContentBlock.inputJson); } catch {}
  this.callbacks.onToolUse({ toolInput: parsedInput, status: 'running' });
  break;

The try/catch is intentional -- if the JSON is malformed (e.g., the stream was interrupted), we still emit the tool event with null input rather than crashing.

The MCP Integration: How Sessions Get Named

One feature we wanted from day one: Claude should name its own sessions.

Instead of extracting titles from the first message with heuristics, we give Claude a tool called rename_session and a system prompt instruction to call it after the first user message.

The In-Process MCP Server

In the SDK world, this is clean. We define a tool using the SDK's tool() helper and register it as an in-process MCP server:

javascript

const renameTool = tool(
  'rename_session',
  'Rename the current chat session with a descriptive title...',
  { title: z.string().max(80) },
  async (args) => {
    onRename(args.title);
    return { content: [{ type: 'text', text: `Session renamed to: "${args.title}"` }] };
  }
);
 
return createSdkMcpServer({
  name: 'bhunductor',
  version: '1.0.0',
  tools: [renameTool]
});

No subprocess. No HTTP. The tool runs in the same process as the session, and the callback directly updates the database and notifies the renderer.

Hiding the Tool from the UI

Users shouldn't see rename_session in their chat. So it's filtered at two levels:

Auto-approved in the canUseTool callback -- never shows a permission prompt
Hidden from the UI via a HIDDEN_TOOLS set -- the onToolUse callback checks the tool name and silently drops it before sending to the renderer

javascript

if (data.toolName && isHiddenTool(data.toolName)) {
  this.hiddenToolUseIds.add(data.toolUseId);
  return;  // Don't send to renderer
}

The System Prompt

The instruction that makes Claude call the tool:

text

IMPORTANT: You MUST call the rename_session tool immediately after the user's
first message, before providing any other response. Generate a concise,
descriptive title (max 80 characters) that summarizes what the user is asking
about or trying to accomplish.

This is appended to the system prompt via the SDK's systemPrompt.append option. Claude is remarkably good at following this -- sessions get names like "Debug React Auth Flow" or "Add Pagination to User List" instead of "New Session 4."

The Data Layer

SQLite -- Boring and Correct

We use better-sqlite3 for everything. All operations are synchronous. There's no ORM.

javascript

const db = getDatabase();
db.prepare(`INSERT INTO claude_sessions (...) VALUES (?, ?, ?, ...)`).run(values);

The schema evolved through 8 migrations, auto-applied on startup:

javascript

const migrations = [
  { name: 'add_active_worktree_id_column', sql: `ALTER TABLE folders ADD COLUMN...` },
  { name: 'add_claude_session_id', sql: `ALTER TABLE claude_sessions ADD COLUMN...` },
  { name: 'add_messages_to_sessions', sql: `ALTER TABLE claude_sessions ADD COLUMN messages TEXT` },
  { name: 'add_archived_flag', sql: `ALTER TABLE claude_sessions ADD COLUMN archived INTEGER DEFAULT 0` },
  // ... etc
];

Each migration runs inside a transaction. A migrations table tracks what's been applied.

Why synchronous SQLite? Because the main process is already async-heavy with IPC, SDK calls, and process management. Having the data layer be synchronous eliminates an entire class of race conditions. better-sqlite3 is fast enough that the sync calls don't block meaningfully.

Two Session IDs

Every session has two IDs:

sessionId -- UUID v4, generated by our app, used for internal tracking
claude_session_id -- assigned by the SDK/CLI on first interaction, used for resume:

On session creation, claude_session_id defaults to our internal sessionId. Once the SDK sends a system event with the real session ID, we update the database. On resume, we pass claude_session_id to the SDK's resume: option.

Message Persistence

Messages are stored as a JSON blob in the messages column:

sql

UPDATE claude_sessions SET messages = ? WHERE id = ?

Not normalized. Not queryable. Just a JSON array of the entire conversation.

This was a deliberate tradeoff. We don't need to query individual messages -- we only ever load the full conversation for a session. The JSON blob approach means persistence is a single UPDATE per save, and loading is a single SELECT + JSON.parse().

Messages are saved on every turn completion, and batch-saved for all active sessions on app quit via saveAllActiveSessions().

The Renderer: Making Streaming Feel Good

The Module-Level Cache Trick

The biggest UX problem with a tabbed chat interface: switching tabs unmounts the component, losing state.

React's solution would be useContext or lifting state up. But for message data that can be megabytes of conversation history, we didn't want it in React's render cycle at all.

The solution: a Map at module scope, outside of React.

javascript

// sessionStore.js
const messageCache = new Map();  // Survives mount/unmount
 
const useSessionStore = create((set, get) => ({
  _messageCacheVersion: 0,  // Trigger re-renders
 
  getMessages: (sessionId) => messageCache.get(sessionId) || [],
 
  updateMessages: (sessionId, updater) => {
    const current = messageCache.get(sessionId) || [];
    messageCache.set(sessionId, updater(current));
    set(state => ({ _messageCacheVersion: state._messageCacheVersion + 1 }));
  }
}));

The Map holds the actual data. The Zustand store holds a version counter. When messages change, the counter increments, which triggers a React re-render. But the data itself never enters React state -- it's read directly from the Map via getMessages().

This means:

Tab switch → component unmounts → data stays in the Map
Switch back → component mounts → reads from Map instantly (no loading, no IPC call)
10 open sessions with thousands of messages → React doesn't re-render on any of them when one session gets a new message

Streaming State: Separate from Messages

Active streaming text is stored separately from committed messages:

javascript

streamingState: {
  [sessionId]: {
    isStreaming: boolean,
    streamingMessage: string  // Text buffer during streaming
  }
}

The streaming component subscribes to this with a targeted Zustand selector:

javascript

const ActiveStreamBlock = memo(({ sessionId }) => {
  const text = useSessionStore(s => s.streamingState[sessionId]?.streamingMessage ?? '');
  return <MarkdownRenderer content={text} />;
});

When the turn completes, commitStreamingText() moves the buffered text into the messages array and clears the streaming state. This separation means:

The streaming block re-renders on every chunk (expected -- text is changing)
The message list does NOT re-render during streaming (nothing in it changed)
Only on commit does the full message list update

Tool Call Grouping

When Claude is on a multi-tool loop (read file → edit file → run test → read output), the raw event stream produces a flat list of tool_use messages. Rendering them individually wastes vertical space.

Instead, consecutive tool calls are grouped:

javascript

const renderItems = useMemo(() => {
  const items = [];
  let toolGroup = [];
 
  const flush = () => {
    if (toolGroup.length) {
      items.push({ type: 'tool_group', tools: [...toolGroup] });
      toolGroup = [];
    }
  };
 
  for (const msg of messages) {
    if (msg.type === 'tool_use') {
      toolGroup.push(msg);
    } else {
      flush();
      items.push({ type: 'single', msg });
    }
  }
  flush();
  return items;
}, [messages]);

A group of 5 tool calls renders as a single collapsible block instead of 5 separate cards. Text responses break the group, so the visual rhythm follows the conversation naturally.

The Design

The entire UI is monospace. Every font -- headings, body, code, sidebar, terminal -- uses JetBrains Mono (with fallbacks to SF Mono, Fira Code, etc.).

This was intentional. Bhunductor is a developer tool. Code is the primary content. A monospace-everywhere approach means code blocks don't feel like a different context -- they blend into the surrounding text. The tool output blocks, the file browser, the terminal -- everything aligns on the same grid.

The color system:

css

--surface-base: #08080a;       /* Near-black background */
--surface-raised: #121215;     /* Cards, sidebar */
--surface-overlay: #1a1a1f;    /* Modals, dropdowns */
--conductor: #36B5AB;          /* Teal accent - the "conductor" */
--ink: #e4e4e7;                /* Primary text */
--ink-secondary: #a1a1aa;      /* Secondary text */
--ink-muted: #52525b;          /* Tertiary text */

Two colors define the personality: graphite (the dark surfaces) and teal (the --conductor accent). The teal is used sparingly -- active states, links, streaming indicators. Everything else is grayscale.

4000+ lines of CSS. No Tailwind, no CSS-in-JS. Just plain CSS with custom properties. It's verbose, but every element has a deliberate style -- nothing defaults to a framework's opinion.

Git Worktrees: Branch-First Workflow

Most git UIs treat branches as metadata. You see a dropdown, pick a branch, and the underlying directory git checkouts. This means uncommitted changes either block the switch or get stashed.

Bhunductor uses git worktrees instead. Each branch is a separate directory on disk:

Loading diagram...

Switching branches in Bhunductor doesn't touch the filesystem of the branch you're leaving. It changes which directory the services point to. Claude sessions, terminal instances, and file browser all bind to the active worktree's path.

This means:

No stashing -- your uncommitted work is exactly where you left it
Parallel work -- Claude can be running on feat-auth while you browse files on main
Clean isolation -- each branch has its own node_modules, build artifacts, etc.

The tradeoff is disk space. Each worktree is a full checkout (minus .git, which is shared). For most repos, that's fine. For monorepos with 50GB of dependencies, you'd feel it.

What's Next

Bhunductor is functional but far from done. The core loop works -- open repo, manage branches, chat with Claude, browse files, use terminal. But there's a lot we want to add:

Slash Commands

Claude CLI supports slash commands (/compact, /clear, /model, etc.). Right now, these aren't exposed in the UI. We want to surface them as first-class actions -- either via a command palette or inline in the chat input.

Hook Support

Claude CLI has a hook system (pre/post tool execution, on errors, etc.). Bhunductor should surface these as configurable triggers -- maybe a settings panel where you can define hooks per repository.

Cross-Platform

Right now it's macOS only (titleBarStyle: hiddenInset for native window controls, node-pty compiled for Darwin). Windows and Linux support means rethinking the title bar, testing native module compilation, and probably a lot of small platform-specific fixes.

Better File Integration

The file viewer is read-only (Monaco in view mode). We'd like to add inline diffing -- showing exactly what Claude changed in each edit, not just the tool call block. Think: a split diff view that opens when you click on a "file edited" tool call.

Session Search

With hundreds of sessions over time, finding the one where you fixed that auth bug three weeks ago should be instant. Full-text search over session messages, filterable by branch, date, and model.

What I Learned

The SDK migration was the best decision. It eliminated three separate processes, an HTTP server, temp file management, and a timeout-based heuristic -- all replaced by an async generator and a callback. If you're building on top of Claude CLI, start with the SDK. Don't make our mistake of shelling out to the process first.

Module-level state works. React's state model doesn't fit every case. For large, stable data that needs to survive component lifecycle (message caches, streaming buffers), a simple Map outside of React with a version counter for re-renders is simpler and faster than any state management solution.

Synchronous SQLite is underrated. In an Electron app where the main process is already juggling async IPC and SDK calls, having the data layer be synchronous removes an entire class of bugs. better-sqlite3 is fast enough that you never notice.

The terminal clinches it. We debated whether the built-in terminal was worth the native module overhead (node-pty). It is. The moment you can see Claude's output and type your own commands in the same window, the context-switching tax drops to zero.

Dark monospace is a design choice, not a default. When your primary content is code and tool output, leaning into monospace everywhere creates visual coherence. It's not pretty in a dribbble-portfolio way, but it's functional in a "I use this 8 hours a day" way.

Bhunductor is open source and under active development. The full source is available on GitHub.

BACK TO ALL POSTS