Claude now has access to a server-side container environment

[+] simonw|6 months ago|reply

I just published an extensive review of the new feature, which is actually Claude Code Interpreter (the official name, bafflingly, is Upgraded file creation and analysis - that's what you turn on in the features page at least).

I reverse-engineered it a bit, figured out its container specs, used it to render a PDF join diagram for a SQLite database and then re-ran a much more complex "recreate this chart from this screenshot and XLSX file" example that I previously ran against ChatGPT Code Interpreter last night.

Here's my review: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...

[+] brumar|6 months ago|reply

These days, I spend time training people using this kind of tools. I am glad it's called as such. It's much comfortable to explain to a tech person that it's "badly named" and that it should have been named "Code Interpreter" instead than explaining to a non tech that the "Code Interpreter" feature is a new cool way to generate documents. Most people are not that comfortable with technology, so avoiding big words is a nice to have.

[+] dang|6 months ago|reply

I've nicked a sentence from your article to use as the title above. Hope that's clearer!

[+] cjonas|6 months ago|reply

Given their relationship with AWS, I wonder if this feature just runs the agent core code interpreter behind the scenes.

[+] mdaniel|6 months ago|reply

> Version Control

> github.com

pour one out for the GitLab hosted projects, or its less popular friends hosted on bitbucket, codeberg, forgejo, sourceforge, sourcehut, et al. So dumb.

[+] simonw|6 months ago|reply

This feature is a little confusing.

It looks to me like a variant of the Code Interpreter pattern, where Claude has a (presumably sandboxed) server-side container environment in which it can run Python. When you ask it to make a spreadsheet it runs this:

  pip install openpyxl pandas --break-system-packages

And then generates and runs a Python script.

What's weird is that when you enable it in https://claude.ai/settings/features it automatically disables the old Analysis tool - which used JavaScript running in your browser. For some reason you can have one of those enabled but not both.

The new feature is being described exclusively as a system for creating files though! I'm trying to figure out if that gets used for code analysis too now, in place of the analysis tool.

[+] simonw|6 months ago|reply

It works for me on the https://claude.ai web all but doesn't appear to work in the Claude iOS app.

I tried "Tell me everything you can about your shell and Python environments" and got some interesting results after it ran a bunch of commands.

Linux runsc 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 24.04.2 LTS

Python 3.12.3

/usr/bin/node is v18.19.1

Disk Space: 4.9GB total, with 4.6GB available

Memory: 9.0GB RAM

Attempts at making HTTP requests all seem to fail with a 403 error. Suggesting some kind of universal proxy.

But telling it to "Run pip install sqlite-utils" worked, so apparently they have allow-listed some domains such as PyPI.

I poked around more and found these environment variables:

  HTTPS_PROXY=http://21.0.0.167:15001
  HTTP_PROXY=http://21.0.0.167:15001

On further poking, some of the allowed domains include github.com and pypi.org and registry.npmjs.org - the proxy is running Envoy.

Anthropic have their own self-issued certificate to intercept HTTPS.

[+] brookst|6 months ago|reply

Odds are the new container and old JavaScript are using the same tool names/parameters. Or, perhaps, they found the tools similar enough that the model got confused having them both explained.

[+] amilios|6 months ago|reply

Anyone else having serious reliability issues with artifact editing? I find that the artifacts quite often get "stuck", where the LLM is trying to edit the artifact but the state of the artifact does not change. Seems like the LLM is somehow failing in editing the artifact silently, while thinking that it is actually doing the edits. The way to resolve this is to ask Claude to make a new artifact, which then has all the changes Claude thought it was making. But you have to do this relatively often.

[+] dajtxx|6 months ago|reply

I saw this yesterday. I was asking it to update an SQL query and it was saying, 'I did this' and then that wasn't in the query. I even saw it put something in the query and then remove it, and then say 'here it is'.

Maybe it's because I use the free tier web interface, but I can't get any AI to do much for me. Beyond a handful of lines (and less yesterday) it just doesn't seem that great. Or it gives me pages of javascript to show a date picker before I RTFM and found it's a single input tag to do that, because it's training data was lots of old and/or bad code and didn't do it that way.

[+] jononor|6 months ago|reply

Yes every 10 edits or so. Super annoying. It is limiting how often I bother using the tool

[+] tkgally|6 months ago|reply

I have had the same problem with artifacts, and I had similar problems several months ago with Claude Desktop. I stopped using those features mostly and use Claude Code instead. I don't like CC's terminal interface, but it has been more reliable for me.

[+] sunaookami|6 months ago|reply

It edits it for me but it tries to edit it "in place" where it messes up the version history and it looks very broken and often times is broken afterwards. Don't know why they broke their best feature while ChatGPT Canvas just works.

[+] efromvt|6 months ago|reply

This has been super annoying! I just tell it to make sure the artifact is updated and it usually fixes it, but it's annoying to have to notice/keep an eye on it.

[+] j45|6 months ago|reply

Quite regularly.

I instruct artifacts to not be used and then explicitly provide instruction to proceed with creation when ready.

[+] wolfgangbabad|6 months ago|reply

My experience is similar. At first Claude was super smart and get even very complicated things right. Now even super simple tasks are almost impossible to finish right, even if I really chop things into small steps. Also it's much slower even on Pro account than a few weeks ago.

[+] strictnein|6 months ago|reply

I'm on the $200 / month account and its also slower than a few weeks ago. And struggling more and more.

I used to think of it as a decent sr dev working alongside me. Not it feels like an untrained intern that takes 4-5 shots to get things right. Hallucinated tables, columns, and HTML templates are its new favorite thing. And calling things "done" that aren't even half done and don't work in the slightest.

[+] insane_dreamer|6 months ago|reply

On Max and also find it slower recently.

Also yesterday tried to use it to debug some AWS issue and it tried to send me down so many wrong paths, and suggested changes that were either plain wrong or had unintended consequences, that if I didn't actually know my stuff and had followed blindly, the results would have been pretty bad or at least a huge time waster. When I called it out it would quickly reverse course ("You're right of course!") and it did provide some helpful snippets but I was unimpressed.

What I find it excellent at is for throw-away scripts to do small jobs or automate little things--stuff I could do but would take me a lot longer (especially in bash).

[+] ranguna|6 months ago|reply

It's still pretty good on my side. I'm just paying for the pro version.

[+] spike021|6 months ago|reply

For the past two to three weeks I've noticed Claude just consistently lagging or potentially even being throttled for pretty minor coding or CLI tasks. It'll basically stop showing any progress for at least a couple minutes. Sometimes exiting the query and re-trying gets it to work but other times it keeps happening. I pay for Pro so I don't think it's just API rate limiting.

Would appreciate if that could be fixed but of course new features are more interesting for them to prioritize.

[+] yyhhsj0521|6 months ago|reply

I use Claude Code at work via AWS bedrock, also personally subscribe to the $20/month Claude. Anecdotallt, Sonnet hasn't slowed down at all. ChatGPT 5 through enterprise plan, on the other hand, has noticeably slowed down or sometimes just not return anything.

[+] Daisywh|6 months ago|reply

I've run into similar issues too. Even small scripts or commands sometimes get throttled. It does not feel like a resource limit. It feels more like the system is just overly sensitive.

[+] gregoryl|6 months ago|reply

Same. My usage is via an internal corp gateway (Instacart), Sonnet 4. Used to be lighting fast, now getting regular slow downs or outright failures. Not seeing it with the various GPT models.

[+] jimmydoe|6 months ago|reply

More people are working after labor day. Fridays and weekends are better, Wednesdays are the worst.

[+] radicalriddler|6 months ago|reply

I see this quite a lot via Copilot using Claude. It'll just get stuck on a token for a while.

[+] leptons|6 months ago|reply

Can you still code without it?

[+] butterisgood|6 months ago|reply

It does this in emacs with efrit. https://github.com/steveyegge/efrit

It can actually drive emacs itself, creating buffers, being told not to edit the buffers and simply respond in the chat etc.

I actually _like_ working with efrit vs other LLM integrations in editors.

In fact I kind of need to have my anthropic console up to watch my usage... whoops!

[+] mkw2000|6 months ago|reply

To everyone who has been feeling like their MAX subscription is a waste of money, give GLM 4.5 a try, i use it with claude code daily on the $3 plan and it has been great

[+] atonse|6 months ago|reply

I pay $100 a month and wouldn’t hesitate for a millisecond if I needed to pay the $200/mo plan if I hit rate limits.

It’s hard to overstate how much of a productivity shift Claude code has been for shipping major features in our app. And ours is an elixir app. It’s even better with React/NextJS.

I literally won’t be hitting any “I need to hire another programmer to handle this workload” limits any time soon.

[+] ewoodrich|6 months ago|reply

It looks like the $3 plan is only a promo price for the 1st month and it's actually $6/mo, or am I missing something?

https://z.ai/payment?productIds=product-6caada

[+] nkzd|6 months ago|reply

Hi, I believe my current Claude subscription is going to waste. Can I ask what 3$ plan are you referring to?

[+] spott|6 months ago|reply

How are you using it with Claude code?

[+] devinprater|6 months ago|reply

Maybe one day Claude can rewrite its interface to be more accessible to blind people like me.

[+] crazygringo|6 months ago|reply

What is inaccessible about it? It's kind of hard to discuss without any particulars.

[+] ctoth|6 months ago|reply

Curious what a11y issues you see with Claude? I use it a remarkable amount and haven't found any showstoppers. Web interface and Claude Code.

[+] SAI_Peregrinus|6 months ago|reply

Anthropic are looking to make money. They need to make absolutely absurd amounts of money to afford the R&D expenses they've already incurred. Features get prioritized based on how much money they might make. Unless forced to by regulation (or maybe social pressure on the executives, but that really only comes from their same class instead of the general public these days) smaller groups of customers get served last. There aren't that many blind people, so there's not very much profit incentive to serve blind people. Unless they're actually violating the ADA or another law or regulation, and can't bribe the regulators for less than the cost of fines or fixing the issue, I'd not expect any improvement.

[+] divan|6 months ago|reply

Oh, nice! One of my biggest issues with mainstream LLMs/apps was that working on the long text (article, script, documentation, etc.) is limited to copy-pasting dance. Which is especially frustrating in comparison to the AI coding assistants that can work on code directly in the file system, using the internet and MCPs at the same time.

I just tried this new feature to work on a text document in a project, and it's a big difference. Now I really want to have this feature (for text at least) in ChatGPT to be able to work on documents through voice and without looking at the screen.

[+] dpflan|6 months ago|reply

Does Microsoft Copilot do this already? Isn't it integrated into Windows and MSFT Office products? Has it been working out for Copilot? Is it helpful? Adoption rates of AI are interesting to say the least.

[+] Balgair|6 months ago|reply

I don't have access to this yet, so can someone who does tell if:

it can take a .PDF with single table with, say, a list of food items and prices. And then in a .docx in the same folder with a table with, say, prices and calories. Can this thing then, in a one shot, produce a .xlsx with the items and calories and save that to the same directory? It really doesn't matter what the lists are of, just keep it very simple A=B, B=C, therefore A=C stuff.

Because, strangely enough, that's pretty much my definition of AGI.

[+] lordnacho|6 months ago|reply

This will either result in a lot of people being able to sleep more, or an absolute avalanche of crap is about to be released upon society.

A lot of the people I graduated with spent their 20s making powerpoint and excel. There would be people with a master's in engineering getting phone calls at 1am, with an instruction to change the fonts on slide 75, or to slightly modify some calculation. Most of the real decision making was, funnily enough, not based on these documents. But it still meant people were working 100 hour weeks.

I could see this resulting in the same work being done in a few minutes. But I could also see it resulting in the MDs asking for 10x the number of slide decks.

[+] AlecSchueler|6 months ago|reply

They had this though? I'm similarly excited/relieved by this announcement.

At the start of summer you could still ask for any kind of file as an artifact and they would produce it and you could download it.

They they changed it to artifacts were only ever seen pages that you could share or view in the app.

Yes this is going to transform how I use Claude... BACK to the way I used it in June!

As a user this post is frustrating as hell to read because I've missed this feature so much, but at the same time thanks for giving it back I guess?

[+] NamlchakKhandro|6 months ago|reply

A linux desktop version of claude would be great, given that it's basically just a Tauri app, it should be pretty trivial...

You could even ask claude code with scopecraft/cmd to plan it all out and implement this.

For anthropic, the excuse that there's not enough time to implement this is a pretty glaring admission about the state and success of AI assisted development.

[+] ttul|6 months ago|reply

I tested this feature out today, applying the same prompt and CSV data to both Claude Opus 4.1 and GPT-5-Thinking. They both chugged away writing Pandas code and produced similar output. It's nice to have another option for data analysis to act as a second opinion on GPT, if nothing else.

[+] ricksunny|6 months ago|reply

‘…now has access to a server-side container environment’

Headline demonstrates why SWEa don’t have to worry about vive coders eating their lunch. Vibe-coders don’t know what a container is, nor why it would be good for it to be in the context of an environment (what’s an environment?), or be server-side for that matter. Now if there were a course that instructed all this kind of architectural tradecraft that isn’t taught in university CS courses (but bootcamps..?), then adding vibe-coding alongside might pose a concern m, at least till the debugging technical debt comes due. But by then the vibecoder had validated a new market on the back of their v0, so thank them for the fresh revenue streams.

[+] hoppp|6 months ago|reply

I got a feeling that clause is basically a super app that will eventually do everything

all SaaS projects building on it to resell functionality will go away because there will be no point to pay the added costs.

[+] all_usernames|6 months ago|reply

The security concerns here are really significant. In the section[1] on security, they write "we recommender you monitor Claude while using this feature." This borders on irresponsible IMO. Monitor what exactly? How should we monitor? What logs and metrics are exposed for security monitoring? How would a user recognize suspicious patterns...?

[+] SubiculumCode|6 months ago|reply

I noticed the other day that chatgpt started preferring to provide me with a download link for code rather than putting it up in canvas. It also started offering me diffs, but as I just write fairly basic data munging scripts for neuroimaging analyses, I don't like to dive too deep into the coding tool boxes/chains...copy paste is easy...although, I would like versioning without making copies of my script for backup

[+] beydogan|6 months ago|reply

Instead of building random features, they have to fix their quality first.

I'm on 100$ Max plan, I would even buy 2x 200$ plan if Opus would stop randomly being dumb. Especially after 7am ET time.

[+] j45|6 months ago|reply

Opus' ability should be the feature being optimized and stabilized, fewer features are needed.

[+] zelphirkalt|6 months ago|reply

Maybe they are switching you to a cheaper to run model after 7am ET time.

345 comments