top | item 42880449

Show HN: Iterm-Mcp – AI Terminal/REPL Control for iTerm2

43 points| deathmonger5000 | 1 year ago |github.com

Hi HN! Ever wish you could just point your AI assistant at your terminal and say 'what's wrong with this output?' That's why I built iterm-mcp. It lets MCP clients like Claude Desktop directly interact with your iTerm2 terminal - reading logs, running commands, using REPLs, and helping debug issues. Want to explore data or debug using a REPL? The AI can start the REPL, run commands, and help interpret the results.

This is an MCP server that integrates with Claude Desktop, LibreChat, and other Model Context Protocol compatible clients.

https://github.com/ferrislucas/iterm-mcp

Note: Independent project, not officially affiliated with iTerm2

## Features

*Efficient Token Use:* iterm-mcp gives the model the ability to inspect only the output that the model is interested in. The model typically only wants to see the last few lines of output even for long running commands.

*Natural Integration:* You share iTerm with the model. You can ask questions about what's on the screen, or delegate a task to the model and watch as it performs each step.

*Full Terminal Control and REPL support:* The model can start and interact with REPL's as well as send control characters like ctrl-c, ctrl-z, etc.

*Easy on the Dependencies:* iterm-mcp is built with minimal dependencies and is runnable via npx. It's designed to be easy to add to Claude Desktop and other MCP clients. It should just work.

## Real-World Example: Debugging Sidekiq Jobs

I needed to debug a Sidekiq job with complex arguments. The arguments were partially obfuscated in the logs. I asked Claude: "open rails console, show me arguments for the latest XYZ job". The model:

1. Launched Rails console 2. Retrieved job details 3. Displayed the arguments that I was looking for

## Architectural Journey

This project had a couple interesting constraints around command execution:

### 1. Token Efficiency Challenge

I wanted to constrain tokens as much as possible. I didn't want to send the entire output of a long running command to the model, but there's not a great way to know which parts of the output are important to what the model is doing. Sampling could be used here, but it's not well supported yet.

*Solution:* I arrived at a pull-based solution for this. The command from the model is sent to the terminal, and the model is made aware of how many lines of output were generated. The model can choose to retrieve as many lines of the buffer that it thinks are relevant.

### 2. Long-Running Process Support

I wanted to support long running processes. It turns out that when you run `brew install ffmpeg` - it takes a while, and it's not always clear when the job is done. In early proof of concepts, the model would assume the command completed successfully and begin sending additional commands to the terminal before the first command had finished.

*Solution:* iTerm provides a way to ask if the terminal is waiting for user input, but I found that it tended to show false positives in certain situations. For example, a long running command would result in iTerm reporting that the terminal was waiting for input when in fact the command was still running. I found that inspecting the processes associated with the terminal and waiting until the most interesting of those processes settles to a low resource usage is a fair indicator of long running commands being ready for input.

## Requirements

* iTerm2 must be running

* Node version 18 or greater

## Safety Considerations

* The user is responsible for using the tool safely.

* No built-in restrictions: iterm-mcp makes no attempt to evaluate the safety of commands that are executed.

* Models can behave in unexpected ways. The user is expected to monitor activity and abort when appropriate.

* For multi-step tasks, you may need to interrupt the model if it goes off track. Start with smaller, focused tasks until you're familiar with how the model behaves.

21 comments

order

pcwelder|1 year ago

Good work.

I wonder if there's really a need for separate write to terminal and read output functions? I was hoping that write command itself would execute and return the output of the command, saving back and forth latency.

> and it's not always clear when the job is done

I've authored a similar mcp [1] (but without terminal ui)

The way I solved it is by setting a special PS1 prompt. So as soon as I get that prompt I know the task is done. I wonder if a similar thing can be done in your mcp?

[1] https://github.com/rusiaaman/wcgw

deathmonger5000|1 year ago

Hi, thanks for the comment! wcgw looks really cool - nice job with it!

> I wonder if there's really a need for separate write to terminal and read output functions? I was hoping that write command itself would execute and return the output of the command, saving back and forth latency.

I traded back and forth latency for lower token use. I didn't want to return gobs of output from `brew install ffmpeg` when the model really only needs to see the last line of output in order to know what to do next.

> The way I solved it is by setting a special PS1 prompt. So as soon as I get that prompt I know the task is done. I wonder if a similar thing can be done in your mcp?

What you suggested with changing the prompt is a good idea, but it breaks down in certain scenarios - particularly if the user is using a REPL. Part of my goal for this is to not have to modify the shell prompt or introduce visual indicators for the AI because I don't want the user to have to work around the AI. I want the AI to help as requested as if it's sitting at your keyboard. I don't want to introduce any friction or really any unwanted change to the user's workflow at all.

It's important to me that this work with REPL's and other interactive CLI utilities. If that weren't a design concern then I'd definitely explore the approach that you suggested.

wrs|1 year ago

I’m about as likely to use this as buy a self-driving car at this point, but thought I’d point out that iTerm does know when the command has finished, thanks to its shell integration scripts, so does that help? Or if that’s not exposed, maybe just set the shell prompt to some distinctive value.

deathmonger5000|1 year ago

Hi, thanks for your comment! I haven't explored that approach. Can you say more? Will using the approach that you suggested support interactive CLI utilities like a REPL? Those are use cases that I definitely want to support with this project.

toprerules|1 year ago

Why would you favor this approach over say, a command line tool that can pipe input into and out of a configurable AI backend, fork subprocesses to perform agent based tasks, etc. The amount of tokens is always bounded by what the user chooses to pipe into the tool. The Unix model is battle hardened, time tested approach. This tool seems like it locks you into iTerm2.

deathmonger5000|1 year ago

> Why would you favor this approach over say, a command line tool that can pipe input into and out of a configurable AI backend, fork subprocesses [...]

I think what you're describing is something that's built to perform agent based tasks. iterm-mcp isn't intended to be that. It's intended to be a bridge from something like Claude Desktop to iTerm. The REPL use case is a key thing to understand here.

What you're describing is great if you want to delegate "install python on my system" for example, but it doesn't support the REPL use case where you want to work with the REPL through something like Claude Desktop.

The other key use case iterm-mcp addresses is asking questions about what's sitting in the terminal right now. For example, you ran `brew install ffmpeg` and something didn't work: you can ask Claude using iterm-mcp.

> This tool seems like it locks you into iTerm2.

This tool is intended for use with iTerm2. It's not that it "locks you into iTerm2" - iterm-mcp is something that you would choose to use if you already use iTerm2.

scottyeager|1 year ago

I'm working on something a bit like what you described. So far it has a command suggestion mode and also a generic question mode. Inbound pipe support is a TODO, but command substitution already works fine for many use cases like passing file contents to the LLM. I'm pretty wary of adding any automatic execution of AI generated commands, though using some isolation scheme like containers is an interesting possibility.

https://github.com/scottyeager/Pal

larusso|1 year ago

A shiver runs down my spine reading „full terminal control“. For me that is a definite no-go. We fight hard to get remote code execution abilities off our system and here we freely invite it. „Hey gpt let me sudo first so you can execute that“

nerdjon|1 year ago

That and the idea of a poorly sanitized log entry or something slipping to the AI and then you have a big problem. Just seems like a security issue waiting to happen.

There was a system not long ago on here for AI automatically running system recovery as triage. I just can’t imagine giving AI any rights to actually run commands without oversight.

I guess good luck explaining why you deleted a database or whatever while diagnosing an app when it decides that the best course of action is to delete and start over or some other really stupid solution.