> I am managing projects in languages I am not fluent in—TypeScript, Rust and Go—and seem to be doing pretty well.
This framing reminds me of the classic problem in media literacy: people know when a journalistic source is poor when they’re a subject matter expert, but tend to assume that the same source is at least passably good when less familiar with the subject.
I’ve had the same experience as the author when doing web development with LLMs: it seems to be doing a pretty good job, at least compared to the mess I would make. But I’m not actually qualified to make that determination, and I think a nontrivial amount of AI value is derived from engineers thinking that they are qualified as such.
Yup — this doesn't match my experience using Rust with Claude. I've spent 2.5 years writing Rust professionally, and I'm pretty good at it. Claude will hallucinate things about Rust code because it’s a statistical model, not a static analysis tool. When it’s able to create code that compiles, the code is invariably inefficient and ugly.
But if you want it to generate chunks of usable and eloquent Python from scratch, it’s pretty decent.
After decades of writing software, I feel like I have a pretty good sense for "this can't possibly be idiomatic" in a new language. If I sniff something is off, I start Googling for reference code, large projects in that language, etc.
You can also just ask the LLM: are you sure this is idiomatic?
I think the concept of "readability" is good, it's a program within Google where your code gets reviewed by an expert in that language (but not necessarily your application / domain); once you're up to a level of writing idiomatic code and fully understanding the language etc, you get readability yourself.
When reviewing LLM code, you should have this readability in the given language yourself - or the code should not be important.
It's been my experience that strongly opinionated frameworks are better for vibe coding regardless of the type system.
For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.
Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.
In comparison a completely unopinionated framework like fastapi, which got a popularity boost in the early a.i. surge, is a mess to work with if you are vibe coding. Most popular frameworks follow the principle of having no clear way how to do things and leave it up to the developer. Opinionated frameworks got out of fashion after rails but it turns out those are significantly better suited for a.i. assisted development.
My experience has been the opposite with Rails because of open-ended patterns with Hotwire. Sure, Rails itself is opinionated but Hotwire provides multiple ways to do the same thing, which confuses LLMs. For example, recently I tried building a form that allows creating related objects inline using modals. Claude 4 Sonnet got quite confused by that request, no matter how much help I provided. It managed in the end but the solution left a lot to be desired for. It can build the same feature using React on it's own with basic instructions.
Same thing with other libraries like HTMX. Using TypeScript with React, and opinionated tools like Tanstack Query helps LLMs be way more productive because it can fix errors quickly by looking at type annotations, and using common patterns to build out user interactions.
This is pretty anecdotal, but it feels like most of the published rails source code you find online (and by extension, an LLM has found) is from large, stable, and well-documented code.
Have you considered that instead, whatever LLM has the most examples of are what it's best at? Perhaps there's more well-structures Rails code in training than Go?
Weird, I thought Go was one of the go-to examples in HN for languages that LLMs work well with, precisely because it's opinionated and has many standard libs. Not that I've tried, my attempts at vibe coding felt disappointing, but I think this contradicts the zeitgeist?
Well yeah, it's like how a 5 year old can talk about what they want in their sandwich but will probably struggle to describe the flavours and textures they enjoy in detail.
This isn't a fully formed thought, but could this be mitigated by giving LLMs your opinions? I am using copilot in more of a pair programming manner and for everything I want to make I give a lot of my opinions in the prompt. My changes are never too large though, a hundred lines of diff at most.
While I agree with the main thesis here, I find this extremely worrying:
> I am amazed every time how my 3-5k line diffs created in a few hours don’t end up breaking anything, and instead even increase stability.
In my personal opinion, there's no way you're going to get a high quality code base while adding 3,000 - 5,000 lines of code from LLMs on a regular basis. Those are huge diffs.
Yes. From experience, for a relatively complex system, 1k+ line PRs from mid-level devs without tests are almost guaranteed to have bugs; often nasty ones which can take many hours to identify and fix.
I remember when I started coding (decades ago), it would take me days to debug certain issues. Part of the problem was that it was difficult to find information online at the time, but another part of the problem was that my code was over-engineered. I could churn out thousands of lines of code quickly but I was only trying to produce code which appeared to work, not code which actually worked in all situations. I would be shocked when some of my code turned out to break once in a while but now I understand that this is a natural consequence of over-complicating the solution and churning out those lines as fast as I could without thinking enough.
Good code is succinct; it looks like it was discovered in its minimal form. Bad code looks like it was invented and the author tried to make it extra fancy. After 20 years coding, I can tell bad code within seconds.
Good code is just easy to read; first of all, you already know what each function is going to do before you even started reading it, just by its name. Then when you read it, there's nothing unexpected, it's not doing anything unnecessary.
Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.
In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.
Right, I've noticed agents are very trigger happy with 'any'.
I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".
Setting up linting with noExplicitAny is essential. But that won’t stop them from disabling it when they can’t figure something out. They’re sneaky little bastards.
This claim needs to be backed up by evals. I could just as well argue the opposite, that LLMs are best at coding Python because there are two orders of magnitude more Python in their training sets than C++ or Rust.
In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.
> In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.
My personal experience is that by doing exactly that, the productivity, code readability, and correctness goes through the roof, at a slight increase in cost due to having to iterate more.
And since that is an actual language-independent comparison, it leads me to believe that yes, static typing does in fact help substantially, and that the current differences between vibe coding languages are, just like you say, due to the relative quantity of training data.
I agree that the training sets for LLMs have much more training data for Python than for Rust. But C++ has existed before Python I believe. So I doubt there is 2 orders of magnitude of Python code more than C++.
My experience with Github Copilot and Python has been that it _does_ generate better code completions for Python. It's sometimes shockingly good at predicting what you want to do in the next 30-50 lines of code based on a few well named variables. But that shockingly good code is also filled with hallucinated classes, methods, parameter ordering, etc. which completely negate its usefulness.
ty still misses things caught by mypy. It also doesn't have the same level of support for Pydantic yet. I use it (because it's so damn fast), but along with mypy, not a replacement yet.
Yes, mypy is slow, but who cares if it's the agent waiting on it to complete.
The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.
Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.
I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`
The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.
In my experience you can get around it by having a linter rule disallowing it and using a local claude file instructing it to fix the linting issues every time it does something.
>The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.
If the goal is just to output code that does not show any linter errors, then yes, choose a dynamically typed language.
But for code that works at runtime? Types are a huge helper for humans and LLMs alike.
It's not so much typing that is valuable for vibecoding, but being able to give the agent hooks into tooling that provides negative feedback for errors. The easiest is typing, sure, because it's built into the compiler. But you can also add in static analysis linters and automated testing, including - notably - testing for performance.
Of course, you have to tell the agent to set up static analysis linters first, and tell the agent to write tests. But then it'll obey them.
The reason why large enterprises could hire armies of juniors in the past, safely, was because they set up all manner of guardrails that juniors could bounce off of. Why would you "hire" a "junior" agent without the same guardrails in place?
Exactly this. The ability of LLMs to write code is going to strongly depend on the availability and quantity of training data. But agentic coding is more than just LLMs, it is also the various abilities that give feedback to the LLM to refine the resulting code...and that is something that strongly typed and statically typed languages do so much better than their weak/dynamic counterparts.
I've noticed a fairly similar pattern. I particularly like vibecoding with golang. Go is extremely verbose, which makes it almost like an opposite perl - writing go is a bad experience, but reading go is delightful. The verbosity of golang makes it so you're able to always jump in and understand context, often from just a single file.
Pre-llms, this was an up front cost when writing golang, which made the cost/benefit tradeoff often not worth it. With LLMs, the cost of writing verbose code not only goes down, it forces the LLM to be strict with what it's writing and keeps it on track. The cost/benefit tradeoff has increased greatly in go's favor as a result.
No shade on Go but you kinda just said that the language has always looked like AI generated code and this works in its favor now because you don’t actually have to write it anymore. Funny, but not sure I’d consider that in Go’s favor.
My experience with Python and Scala so far is different. With Python the LLM's do a pretty good job. The code always compiles, sometimes there are some logical or architectural errors but that's it.
With Scala, I have to give the LLM a super simple job, e.g. creating some mock data for a unit test, and even then it frequently screws up; every now and then it gives me code that doesn't even compile. So much for Scala's strong type system ..
I've been asking it to spit out python all day long and it just flies with it. Ask all the LLMs most of them will tell you Python is the top if not preferred language.
I can vibecode in Rust but I don't like the result. There are too many lines of code and they are too long and contain too many symbols and extra stuff.
Just compare SeaORM with Ruby + sequel where you just inherit the Sequel::Model class and Sequel reads the table schema without you having to tell it to. It gives you objects with one method for each column and each value has the correct type.
I was happy with Ruby's performance 15 years ago and now it's about 7-20x with a modern ruby version and CPU, one a single thread.
AI is still helpful to learn but it doesn't need to do the coding when using Ruby.
I think the same criteria apply with or without AI for choosing a language. Is it a single-person project? Does it really require highly optimized machine code? etc.
The real win isn't static vs dynamic typing. It's immediate, structured feedback for LLM iteration. cargo check gives the LLM a perfectly formatted error it can fix in the next iteration. Python's runtime errors are often contextless ('NoneType has no attribute X') and only surface after execution. Even with mypy --strict, you need discipline to check it constantly. The compiler makes good feedback unavoidable.
The closest we got to vibe coding pre-LLMs was using a language with a very good strong type system in a good IDE and hitting Ctrl-Space to autocomplete your way to a working program.
I wonder if LLMs can use the type information more like a human with an IDE.
eg. It generates "(blah blah...); foo." and at that point it is constrained to only generate tokens corresponding to public members of foo's type.
Just like how current gen LLMs can reliably generate JSON that satisfies a schema, the next gen will be guaranteed to natively generate syntactically and type- correct code.
> I wonder if LLMs can use the type information more like a human with an IDE.
Just throw more GPUs at the problem and generate N responses in parallel and discard the ones that fail to match the required type signature. It’s like running a linter or type check step, but specific to that one line.
You already can use LLM engines that force generation according to an arbitrary CFG definition. I am not aware of any systems that apply that to generating actual programming language code.
My experience with Haskell has been the same. The GHC provides stellar feedback, so the LLM is almost always able to bang the code into working order, but wow is that code bloated.
My experience suggests the opposite of what this article claims. Claude Code is ridiculously good with vanilla JavaScript, provided that your code is well written. I tried it with a TypeScript code base and it wasn't anywhere near as good.
With JS, Claude has very high success rate. Only issue I had with it was that one time it forgot to update one part of the code which was in a different file but as soon as I told it, it updated it perfectly.
With TypeScript my experience was that it struggles to find things. Writing tests was a major pain because it kept trying to grep the build output because it had to mock out one of the functions in the test and it just couldn't figure it out.
Also typed code it produces is more complex to solve the same problem with more different files and it struggles to get the right context. Also TS is more verbose (this is objectively true and measurable); requires more tokens so it literally costs more.
Writing rust and the LLM almost never gets function signatures and returns types wrong.
That just leaves the business logic to sort out. I can only imagine that IDEs will eventually pair directly with the compiler for instant feedback to fix generations.
But rust also has traits, lifetimes, async, and other type flavors that multiples complexity and causes issues. It also an in progress language… im about to add a “don’t use once cell.. it’s part of std now “ to my system prompt. So it’s not all sunshine, and I’m deeply curious how a pure vibe coded rust app would turn out.
Folks here may be interested in checking out Isograph. In [this conference talk](https://www.youtube.com/watch?v=sf8ac2NtwPY), I vibe code an Isograph app, and make non-trivial refactors to it using Cursor. This is only feasible because the interface between components is very simple, and all the hard stuff (generating a query for exactly the needed data, wiring things up, etc.) is done deterministically, by a compiler.
It's not quite the same principal OP articulates, which is that a compiler provides safety and that certainty lets you move fast when vibe coding. Instead, what I'm claiming is that you can move fast by allowing the LLM to focus on fewer things. (Though, incidentally, the compiler does give you that safety net as well.)
I'm really shocked at how slow people are to realize this, because it's blindingly obvious. I guess that just shows how much the early adopter crowed is dominated by python and javascript.
(BTW the answer is Go, not Rust, because the other thing that makes a language well suited for AI development is fast compile times.)
My experience with agent-assisted programming in Rust is that the agent typically runs `cargo check` instead of `cargo build` for this exact reason -- it's much faster and catches the relevant compilation errors.
(I don't have an opinion on one being better than the other for LLM-driven development; I've heard that Go benefits from having a lot more public data available, which makes sense to me and seems like a very strong advantage.)
I have the same impressions. Typing helps a lot, and (I think) in a few ways - one is being a safe guard, second a constraint (so say, AI is less likely to create a clunky variable which can be a string, list, or a few other things), third - to prompt into writing solid code in general.
I add one more step - add strong linting (ESLint with all recommended rules switched on, Ruff for Python) and asking to run it after each edit.
Usually I also prompt to type things well, and avoid optional types unless strictly necessary (LLMs love to shrink responsibility that way).
I've been wondering about this for some time. My initial assumption was that would be that LLMs will ultimately be the death of typed languages, because type systems are there to help programmers not make obvious mistakes, and near-perfect LLMs would almost never make obvious mistakes. So in a world of near-perfect LLMs, a type system is just adding pointless overhead.
In this current world of quite imperfect LLMs, I agree with the OP, though. I also wonder whether, even if LLMs improve, we will be able to use type systems not exactly for their original purpose but more as a way of establishing that the generated code is really doing what we want it to, something similar to formal verification.
Although, to be fair this is far from vibecoding. Your setup, at a glance, says a lot about how you use the tools, and it's clear you care about the end result a lot.
You have a PRD file, your tasks are logged, each task defines both why's and how's, your first tasks are about env setup, quality of dev flow, exploration and so on. (as a nice tidbit, the model(s) seem to have caught on to this, and I see some "WHY:" as inline comments throughout the code, with references to the PRD. This feels nice)
It's a really cool example of "HOW" one should approach LLM-assisted coding, and shows that methods and means matter more than your knowledge in langx or langy. You seem to have used systems meant to help you in both speed of dev and ease of testing that what you got is what you need. Kudos!
I might start using your repo as a good example of good LLM-assisted dev flows.
That seems a little bit dangerous, why not do it in a language you know ? Plus, this is not launching rockets on the moon, it's a sentence splitter with a fancy state machine (probably very useful in your niche, not a critique) - the difficulty was for you to put the effort to build a complicated state machine, the rest was frankly... not very LLM-needing and now you can't maintain your own stuff without Nvidia burning uranium.
Did the LLM help at all in designing the core, the state machine itself ?
Here’s a study that found that for small problems Gemini is almost equally good at Python and Rust. Looking at the scores of all the languages tested, it seems that the popularity of the language is the most important factor:
Such extraordinary claims, require extraordinary evidence. Not "vibes"
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees.
There are no "safety guarantees" with typed, compiled languages such as C, C++, and
the like. Even with Go, Rust and others, if you don't know the language well enough, you won't find the "logic bugs" and race conditions in your own code that the LLM creates; even with the claims of "safety guarantees".
Additionally, the author is slightly confusing the meaning of "safety guarantees" which refers to memory safety. What they really mean is "reasoning with the language's types" which is easier to do with Rust, Go, etc and harder with Python (without types) and Javascript.
Again we will see more of LLM written code like this example: [0]
> I am managing projects in languages I am not fluent in—TypeScript, Rust and Go—and seem to be doing pretty well.
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I “vibed” projects into existence in Python since forever
[...]
> For example, I refactored large chunks of our TypeScript frontend code at TextCortex. Claude Code runs tsc after finishing each task and ensures that the code compiles before committing. This let me move much faster compared to how I would have done it in Python, which does not provide compile-time guarantees.
While Python doesn't have a required compulation step, it has both a standard type system and typecheckers for it (mypy, etc.) that are ubiquitous in the community and could be run at the same point in the process.
I would say it's not just Rust, TypeScript, and Go that the author has a weak foundation in.
I'm not sure I agree with the author's conclusion. While python was never a great language for large codebases and it thrived because people with little development knowledge could get going pretty easily, a large part of its current appeal is the profusion of great specialized libraries which you would have to code yourself in other languages.
I suspect vibe coding will not be a good fit for writing these libraries, because they require knowledge and precision which the typical vibe coding use probably doesn't show, or the willingness to spend time on the topic which is also typically not what drives people to vibe coding.
So my conclusion would be that vibe coding drives the industry to solidify around already well-established ecosystem, since less of the people producing code will have the time, knowledge and/or will to build that ecosystem in newer languages. Whether that drive is strong enough to be noticable is another question.
Then again, LLMs are well-suited to translate stuff, a relatively grunt work kind of task, so porting libs to your ecosystem of choice is a lot more feasible now.
Perhaps there is a future where individuals can translate large numbers of libraries, and instead of manually porting future improvements of the original versions to the copies, just rerun the translation as needed.
I am comfortable with both Python and Go. I prefer Go for performance; however, the earlier issue was verbosity.
It is easier to write things using a Python dict than to create a struct in Go or use the weird `map[string]interface{}` and then deal with the resulting typecast code.
After I started using GitHub Copilot (before the Agents), that pain went away. It would auto-create the field names, just by looking at the intent or a couple of fields. It was just a matter of TAB, TAB, TAB... and of course I had to read and verify - the typing headache was done with.
I could refactor the code easily. The autocomplete is very productive. Type conversion was just a TAB. The loops are just a TAB.
With Agents, things have become even better - but also riskier, because I can't keep up with the code review now - it's overwhelming.
I have found this to be true as well. Although I exclusively used python and R at work and tried CC several times for small side projects, it always seemed to have problems and ended up in a loop trying to fix its own errors. CC seems much better at vibe coding with typescript. I went from no knowledge of node.js development to deploying reasonable web app on vercel in a few days. Asking CC to run tsc after changes helps it fix any errors because of the faster feedback from the type system compared to python. Granted this was only for a personal side project and may not be true for production systems that might be much larger, I was pleasantly surprised how easy it was in typescript compared to python
It may be a Claude specific thing. I tried to ask Claude to various tasks in machine learning, like implement gradient boosting without specifying the language, thinking it will use Python since it is the most common option and have utilities like Numpy to make it much easier. But Claude mostly choose Javascript for the language and somehow managed to do it in JS.
The argument against Python is weak because Python can be written with types. Moreover, the types can be checked for correctness by various type checkers.
The issue is those who don't use type checkers religiously with Python - they give Python a bad name.
LLMs also write good C, if well directed. My feeling is that this is not really about C or something inherent to Python (where I get not stellar results), but to the large low quality Python code bases that are out there. Basically my hypothesis is that, within the training set, there are languages with better examples and languages with worse examples. I found that to write better Python, prompt engineering goes a great length: especially stressing of not using not really needed dependencies, to write simple, avoid trivial asserts that are not really useful, and so forth.
My experience with LLMs in Rails has been... pretty bad. It isn't good at tracking 'context' (not in the technical token sense) and constantly gets lost in the sauce and doing weird stuff.
Given Rail's maturity, i would have expected otherwise - there is tons of Ruby/Rails code to train on, but... yeah.
OTOH, doing some side-project stuff in TS, and the difference is a little mindblowing. I can see the hype behind vibecoding WAY more.
Interesting...my experience has been that LLMs are generally better at more common languages (not surprising: more data exists in those languages!). So, my admittedly amateur vibe coding experiences have been best in Python and pretty vanilla web development setups. When I push LLMs to, say, fit advanced statistical models in R, they fall apart pretty badly. Yet they can crush a PyTorch or SciKitLearn task no problem.
This. This is the most important thing to consider: the available corpus the model was trained on. Remember that LLMs are inferring code. They don't "know" anything at all about its axiomatic workings. They just know what "looks right" and what "looks wrong". Agentic and RL are about to make this philosophy obsolete on grand scale, but signs still don't look good for being any to improve how much they can "hold in their head" to infer what token to spit out next from the vector embedding, tho.
I have not found this to be the case at all. Type mismatches have been very common in Java, C++ and Objective-C inference output. I think there is complexity in what contributes to LLM suitability to programming tasks, and the nature and history of APIs relevant to the ask are a big part of that. Seems that the OP really loves their types, like many here, and this article is just more evangelism.
The language using the fewest punctuation tokens is going to be the safest from most categories of hallucination, and give each context window the greatest usable space for vector manipulation headed into self-attention before the model suffers from "vector-clouded judgment" due to overcrowded latent space.
I've found most LLMs I've tried generate better code in typed, procedural languages than they do in something like Clojure.
From the perspective of a primarily backend dev who knows just enough React/ts to be dangerous, Claude is generating pretty decent frontend code, letting me spend more time on the Rust backend of my current side project.
> generate better code in typed, procedural languages
Better in what sense? I've been using Anthropic models to write in different Lisps - Fennel, Clojure, Emacs Lisp, and they do perform a decent job. I can't always blindly copy-and-paste generated code, but I wouldn't do that with any PL.
All existing programming languages are designed for human beings. Is it the right time to design something that is specifically for vibe coding? For example, ease of read/understanding is probably much more important than all the syntactic sugars to reduce typing. Creating ten ways to accomplish the same task is not useful for LLMs.
This is the complete opposite of how LLMs are trained. LLMs are most effectively prompted (for instruct/chat finetunes anyway, i.e. chatbots) through the same kind of language patterns (natural or formal/programming) that they learn from. Trying to write formal prompts to them is exactly as misguided as speaking to your friends and family in C.
I've been wondering if Java would have a resurgence due to strong typing even into the error types, and widespread runtime availability. But so far, seems no.
Ease of understanding; JavaScript. That was literally its design goal; JS might have a whole lot of bad parts but it's flexible and easy to understand.
I have made the exact opposite with Claude and low level C. Claude is very good in writing classic c functions you need on a daily basis. I often wonder how much defensive coding it puts into the functions. I for myself let any code I write at least be read one time by Claude now
I’ve not had good success with vibing rust. It requires lots and lots of handholding and editing. Perhaps it’s because the model is always trying to do things from scratch. It does a poor job of finding crates and understanding the docs and implementation.
Typed but maybe with the exception of the likes of Swift where Claude reveals just how complex and ambiguous the language can be.
The lack of documentation and overly complex proposal documents also appear to overload the LLM context and confuse them.
Typed languages are also better suited to IDE assistance and static analysis
I'm a relatively old school lisp fan, but it's hard to do this job for a long time without eventually realizing helping your tools is more valuable than helping yourself
So if for whatever reason it is better for vibe coding, then legacy code aside, why would anyone not use a technology that makes it a bit easier for them to understand what the AI is actually churning out on their behalf?
I can see this making sense purely from a tool chain perspective. If we’re are entering the age of treating code like cattle then it would make sense overly verbose and strict languages may benefit from it.
I programmed my services in Python without any convention and I suffered a lot. Now, I do it in typed languages with a strong compulsory conventions and things are far more manageable.
Nim might hit the sweet-spot here: typed, compiled, and Python-like.
I wrote this [1] comment a few weeks ago:
"""
... Claude Code is surprisingly good at writing Nim. I just created a QuickJS + MicroPython wrapper in Nim with it last week, and it worked great!
Don't let "but the Rust/Go/Python/JavaScript/TypeScript community is bigger!" be the default argument. I see the same logic applied to LLM training data: more code means more training data, so you should only use popular languages. That reasoning suggests less mainstream languages are doomed in the AI era.
But the reality is, if a non-mainstream language is well-documented and mature (Nim's been around for nearly 20 years!), go for it. Modern AI code gen can help fill in the gaps.
"""
Python has static typing unless you don't add any types. The vast majority of reputable Python codebases nowadays use static typing rigorously. If you don't, you should. To enforce it when coding with an agent you can either tell the agent to run the type checker after every edit (e.g. via a hook in Claude Code), or if you're using an agent that has access to the LSP diagnostics then tell it to look at them and demand that they are clean after every edit (easy with Cursor, and achieveable in Claude Code I believe via MCP).
> The vast majority of reputable Python codebases nowadays use static typing rigorously
As judged by who? And in what field?
I mean, if I look at the big Python libraries I use regularly none of them have types - Django, DRF, NumPy, SciPy, Scikit-learn. That’s not to say there aren’t externally provided stubs but the library authors themselves are often not the ones writing them
Nit upfront: Python is typed, just not statically typed.
What dynamically typed languages lack in compile-time safety, the programmer must make up using (automated) testing. With adequate tests, a python program doesn't break more than a Rust or Go program. It's just that people often regard testing as an annoying chore which is the first thing they skip when vibe coding (or "going fast and breaking things" which is then literally what happens).
"a python program doesn't break more than a Rust or Go program"
but it is tho, You literally can just give LLM to check LSP to analyze early it for you without write test to begin, Their LSP and Compiler is just that smart
I'm not aware of any rigorous study on it, but my personal anecdote is that I don't even bother with Claude Code or similar unless the language is Haskell, the deployment is Nix, the config is Dhall, and I did property tests. Once you set it up like that you just pour money in until its too much money or its stuck, and thats how far LLMs can go now.
I used to yell at Claude Code when it tried to con me with mocks to get the TODO scratched off, now I laugh at the little bastard when it tries to pull a fast one on -Werror.
Nice try Claude Code, but around here we come to work or we call in sick, so what's it going to be?
Everything said is true without AI as well, at least for me. I don't hate Python, and I like it for very small scripts, but for large programs the lack of static type makes it much to brittle IMO. Static typing gives the confidence that not every single line needs testing, which reduces friction during the lifecycle of the code.
I wouldn't worry too much, no-one seems to be able to agree what it means anyway.
Depending on who you speak to it can be anything from coding only by describing the general idea of what you want, to just being another term for LLM assisted programming.
The strict original definition of vibe coding is it is LLM writing code with the programmer never caring about the code, only caring about the code's runtime output. It is easily the worst way to use LLMs for code, and I think even coining the term was a highly irresponsible and society-damaging move by Karpathy, making me lose much respect for him. This coined definition was taken literally by managers to fire workers.
In truth, for LLM generated code to be maintainable and scalable, it first needs to be speced-out super well by the engineer in collaboration with the LLM, and then the generated code must also be reviewed line-by-line by the engineer.
There is no room for vibe coding in making things that last and don't immediately get hacked.
It’s fine to not know what it is, but what is the rationale for commenting that you don’t know? Why not just look it up? Or don’t, as you’re too afraid to ask.
tldr; fast throwaway code from a LLM, where the human is just looking at the results and not trying to make maintainable code.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Vibecoding is bad coding. Always. Even if I take the headline as correct, so what? It's still crap code that will collapse into an unmaintainable mess sooner rather than later.
woodruffw|7 months ago
This framing reminds me of the classic problem in media literacy: people know when a journalistic source is poor when they’re a subject matter expert, but tend to assume that the same source is at least passably good when less familiar with the subject.
I’ve had the same experience as the author when doing web development with LLMs: it seems to be doing a pretty good job, at least compared to the mess I would make. But I’m not actually qualified to make that determination, and I think a nontrivial amount of AI value is derived from engineers thinking that they are qualified as such.
muglug|7 months ago
But if you want it to generate chunks of usable and eloquent Python from scratch, it’s pretty decent.
And, FWIW, I’m not fluent in Python.
js2|7 months ago
You can also just ask the LLM: are you sure this is idiomatic?
Of course it may lie to you...
giantrobot|7 months ago
[0] https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
Cthulhu_|7 months ago
When reviewing LLM code, you should have this readability in the given language yourself - or the code should not be important.
unknown|7 months ago
[deleted]
bravesoul2|7 months ago
unknown|7 months ago
[deleted]
timuckun|7 months ago
For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.
Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.
siva7|7 months ago
brokegrammer|7 months ago
Same thing with other libraries like HTMX. Using TypeScript with React, and opinionated tools like Tanstack Query helps LLMs be way more productive because it can fix errors quickly by looking at type annotations, and using common patterns to build out user interactions.
topato|7 months ago
EGreg|7 months ago
the more constraints you have, the more freedom you have to "vibe" code
and if someone actually built AI for writing tests, catching bugs and iterating 24/7 then you'd have something even cooler
mhluongo|7 months ago
delifue|7 months ago
debugnik|7 months ago
globular-toast|7 months ago
hkt|7 months ago
newswasboring|7 months ago
liveoneggs|7 months ago
Reubend|7 months ago
> I am amazed every time how my 3-5k line diffs created in a few hours don’t end up breaking anything, and instead even increase stability.
In my personal opinion, there's no way you're going to get a high quality code base while adding 3,000 - 5,000 lines of code from LLMs on a regular basis. Those are huge diffs.
jongjong|7 months ago
I remember when I started coding (decades ago), it would take me days to debug certain issues. Part of the problem was that it was difficult to find information online at the time, but another part of the problem was that my code was over-engineered. I could churn out thousands of lines of code quickly but I was only trying to produce code which appeared to work, not code which actually worked in all situations. I would be shocked when some of my code turned out to break once in a while but now I understand that this is a natural consequence of over-complicating the solution and churning out those lines as fast as I could without thinking enough.
Good code is succinct; it looks like it was discovered in its minimal form. Bad code looks like it was invented and the author tried to make it extra fancy. After 20 years coding, I can tell bad code within seconds.
Good code is just easy to read; first of all, you already know what each function is going to do before you even started reading it, just by its name. Then when you read it, there's nothing unexpected, it's not doing anything unnecessary.
pablitokun|7 months ago
elcritch|7 months ago
Not to different from how a college CS student who hasn't learned git yet would do come to think of it.
Still pretty bad if the author isn't taking the time to at least cull the changes. Though guess it could just be file renames?
noahbp|7 months ago
lukev|7 months ago
Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.
In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.
resonious|7 months ago
I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".
energy123|7 months ago
(1) Are current LLMs better at vibe coding typed languages, under some assumptions about user workflow?
(2) Are LLMs as a technology more suited to typed languages in principle, and should RL pipelines gravitate that way?
mewpmewp2|7 months ago
dimal|7 months ago
f311a|7 months ago
linkage|7 months ago
In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.
darksaints|7 months ago
My personal experience is that by doing exactly that, the productivity, code readability, and correctness goes through the roof, at a slight increase in cost due to having to iterate more.
And since that is an actual language-independent comparison, it leads me to believe that yes, static typing does in fact help substantially, and that the current differences between vibe coding languages are, just like you say, due to the relative quantity of training data.
yibers|7 months ago
Merad|7 months ago
js2|7 months ago
Yes, mypy is slow, but who cares if it's the agent waiting on it to complete.
dccsillag|7 months ago
herrington_d|7 months ago
Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.
I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`
The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.
mithras|7 months ago
MattGaiser|7 months ago
noahbp|7 months ago
If the goal is just to output code that does not show any linter errors, then yes, choose a dynamically typed language.
But for code that works at runtime? Types are a huge helper for humans and LLMs alike.
solatic|7 months ago
Of course, you have to tell the agent to set up static analysis linters first, and tell the agent to write tests. But then it'll obey them.
The reason why large enterprises could hire armies of juniors in the past, safely, was because they set up all manner of guardrails that juniors could bounce off of. Why would you "hire" a "junior" agent without the same guardrails in place?
darksaints|7 months ago
jjcm|7 months ago
Pre-llms, this was an up front cost when writing golang, which made the cost/benefit tradeoff often not worth it. With LLMs, the cost of writing verbose code not only goes down, it forces the LLM to be strict with what it's writing and keeps it on track. The cost/benefit tradeoff has increased greatly in go's favor as a result.
WD-42|7 months ago
misja111|7 months ago
With Scala, I have to give the LLM a super simple job, e.g. creating some mock data for a unit test, and even then it frequently screws up; every now and then it gives me code that doesn't even compile. So much for Scala's strong type system ..
jmfldn|7 months ago
smrtinsert|7 months ago
dominicrose|7 months ago
Just compare SeaORM with Ruby + sequel where you just inherit the Sequel::Model class and Sequel reads the table schema without you having to tell it to. It gives you objects with one method for each column and each value has the correct type.
I was happy with Ruby's performance 15 years ago and now it's about 7-20x with a modern ruby version and CPU, one a single thread.
AI is still helpful to learn but it doesn't need to do the coding when using Ruby. I think the same criteria apply with or without AI for choosing a language. Is it a single-person project? Does it really require highly optimized machine code? etc.
joker99|7 months ago
MutedEstate45|7 months ago
seunosewa|7 months ago
exclipy|7 months ago
I wonder if LLMs can use the type information more like a human with an IDE.
eg. It generates "(blah blah...); foo." and at that point it is constrained to only generate tokens corresponding to public members of foo's type.
Just like how current gen LLMs can reliably generate JSON that satisfies a schema, the next gen will be guaranteed to natively generate syntactically and type- correct code.
koolba|7 months ago
Just throw more GPUs at the problem and generate N responses in parallel and discard the ones that fail to match the required type signature. It’s like running a linter or type check step, but specific to that one line.
esafak|7 months ago
treyd|7 months ago
unknown|7 months ago
[deleted]
sshine|7 months ago
I've been vibe-coding for a few days in Haskell, and I don't like the result.
Maybe I am just accustomed to being ok with verbose Rust, while Haskell comes with a great potential for elegance that the LLM does not explore.
Regardless, the argument that types guide the LLM in a very reliable way holds in both cases.
xwiz|7 months ago
jongjong|7 months ago
With JS, Claude has very high success rate. Only issue I had with it was that one time it forgot to update one part of the code which was in a different file but as soon as I told it, it updated it perfectly.
With TypeScript my experience was that it struggles to find things. Writing tests was a major pain because it kept trying to grep the build output because it had to mock out one of the functions in the test and it just couldn't figure it out.
Also typed code it produces is more complex to solve the same problem with more different files and it struggles to get the right context. Also TS is more verbose (this is objectively true and measurable); requires more tokens so it literally costs more.
J_Shelby_J|7 months ago
That just leaves the business logic to sort out. I can only imagine that IDEs will eventually pair directly with the compiler for instant feedback to fix generations.
But rust also has traits, lifetimes, async, and other type flavors that multiples complexity and causes issues. It also an in progress language… im about to add a “don’t use once cell.. it’s part of std now “ to my system prompt. So it’s not all sunshine, and I’m deeply curious how a pure vibe coded rust app would turn out.
762236|7 months ago
rbalicki|7 months ago
It's not quite the same principal OP articulates, which is that a compiler provides safety and that certainty lets you move fast when vibe coding. Instead, what I'm claiming is that you can move fast by allowing the LLM to focus on fewer things. (Though, incidentally, the compiler does give you that safety net as well.)
jbellis|7 months ago
(BTW the answer is Go, not Rust, because the other thing that makes a language well suited for AI development is fast compile times.)
woodruffw|7 months ago
(I don't have an opinion on one being better than the other for LLM-driven development; I've heard that Go benefits from having a lot more public data available, which makes sense to me and seems like a very strong advantage.)
stared|7 months ago
I add one more step - add strong linting (ESLint with all recommended rules switched on, Ruff for Python) and asking to run it after each edit.
Usually I also prompt to type things well, and avoid optional types unless strictly necessary (LLMs love to shrink responsibility that way).
For example, see my recent vibe-coding instructions, https://github.com/QuesmaOrg/demo-webr-ggplot/blob/main/CLAU....
chrisjharris|7 months ago
In this current world of quite imperfect LLMs, I agree with the OP, though. I also wonder whether, even if LLMs improve, we will be able to use type systems not exactly for their original purpose but more as a way of establishing that the generated code is really doing what we want it to, something similar to formal verification.
ImprobableTruth|7 months ago
However perfect LLMs would just replace compilers and programming languages above assembly completely.
SteveJS|7 months ago
I did this not knowing any rust: https://github.com/KnowSeams/KnowSeams and rust felt like a very easy to use a scripting language.
NitpickLawyer|7 months ago
Although, to be fair this is far from vibecoding. Your setup, at a glance, says a lot about how you use the tools, and it's clear you care about the end result a lot.
You have a PRD file, your tasks are logged, each task defines both why's and how's, your first tasks are about env setup, quality of dev flow, exploration and so on. (as a nice tidbit, the model(s) seem to have caught on to this, and I see some "WHY:" as inline comments throughout the code, with references to the PRD. This feels nice)
It's a really cool example of "HOW" one should approach LLM-assisted coding, and shows that methods and means matter more than your knowledge in langx or langy. You seem to have used systems meant to help you in both speed of dev and ease of testing that what you got is what you need. Kudos!
I might start using your repo as a good example of good LLM-assisted dev flows.
xwolfi|7 months ago
Did the LLM help at all in designing the core, the state machine itself ?
randomifcpfan|7 months ago
https://jackpal.github.io/2025/03/29/Gemini_2.5_Pro_Advent_o...
whytevuhuni|7 months ago
If Gemini is equally good at them in spite of that, doesn't that mean it'd be better at Rust than at Python if it had equal training in both?
reflectiveattn|7 months ago
[deleted]
rvz|7 months ago
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees.
There are no "safety guarantees" with typed, compiled languages such as C, C++, and the like. Even with Go, Rust and others, if you don't know the language well enough, you won't find the "logic bugs" and race conditions in your own code that the LLM creates; even with the claims of "safety guarantees".
Additionally, the author is slightly confusing the meaning of "safety guarantees" which refers to memory safety. What they really mean is "reasoning with the language's types" which is easier to do with Rust, Go, etc and harder with Python (without types) and Javascript.
Again we will see more of LLM written code like this example: [0]
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
dragonwriter|7 months ago
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I “vibed” projects into existence in Python since forever
[...]
> For example, I refactored large chunks of our TypeScript frontend code at TextCortex. Claude Code runs tsc after finishing each task and ensures that the code compiles before committing. This let me move much faster compared to how I would have done it in Python, which does not provide compile-time guarantees.
While Python doesn't have a required compulation step, it has both a standard type system and typecheckers for it (mypy, etc.) that are ubiquitous in the community and could be run at the same point in the process.
I would say it's not just Rust, TypeScript, and Go that the author has a weak foundation in.
pyrale|7 months ago
I suspect vibe coding will not be a good fit for writing these libraries, because they require knowledge and precision which the typical vibe coding use probably doesn't show, or the willingness to spend time on the topic which is also typically not what drives people to vibe coding.
So my conclusion would be that vibe coding drives the industry to solidify around already well-established ecosystem, since less of the people producing code will have the time, knowledge and/or will to build that ecosystem in newer languages. Whether that drive is strong enough to be noticable is another question.
jaynetics|7 months ago
Perhaps there is a future where individuals can translate large numbers of libraries, and instead of manually porting future improvements of the original versions to the copies, just rerun the translation as needed.
anupshinde|7 months ago
It is easier to write things using a Python dict than to create a struct in Go or use the weird `map[string]interface{}` and then deal with the resulting typecast code.
After I started using GitHub Copilot (before the Agents), that pain went away. It would auto-create the field names, just by looking at the intent or a couple of fields. It was just a matter of TAB, TAB, TAB... and of course I had to read and verify - the typing headache was done with.
I could refactor the code easily. The autocomplete is very productive. Type conversion was just a TAB. The loops are just a TAB.
With Agents, things have become even better - but also riskier, because I can't keep up with the code review now - it's overwhelming.
OldfieldFund|7 months ago
[tool.ruff]
line-length = 88
select = ["E", "F", "W", "I", "N", "UP", "B", "C4"] # A good strict baseline
ignore = []
[tool.mypy]
python_version = "3.12"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_any_unimported = true
no_implicit_optional = true
check_untyped_defs = true
strict = true
lysecret|7 months ago
NischalM|7 months ago
cttet|7 months ago
koakuma-chan|7 months ago
It's time for people to wake up and stop using Python, and forcing me to use Python
warrenmiller|7 months ago
foreach (string enumName in Enum.GetNames(typeof(Pair)))
{
OutOfHere|7 months ago
The issue is those who don't use type checkers religiously with Python - they give Python a bad name.
antirez|7 months ago
brikym|7 months ago
helle253|7 months ago
Given Rail's maturity, i would have expected otherwise - there is tons of Ruby/Rails code to train on, but... yeah.
OTOH, doing some side-project stuff in TS, and the difference is a little mindblowing. I can see the hype behind vibecoding WAY more.
levocardia|7 months ago
reflectiveattn|7 months ago
waffletower|7 months ago
throwawaymaths|7 months ago
reflectiveattn|7 months ago
jcadam|7 months ago
From the perspective of a primarily backend dev who knows just enough React/ts to be dangerous, Claude is generating pretty decent frontend code, letting me spend more time on the Rust backend of my current side project.
iLemming|7 months ago
Better in what sense? I've been using Anthropic models to write in different Lisps - Fennel, Clojure, Emacs Lisp, and they do perform a decent job. I can't always blindly copy-and-paste generated code, but I wouldn't do that with any PL.
fluxkernel|7 months ago
reflectiveattn|7 months ago
largbae|7 months ago
jongjong|7 months ago
rgoldfinger|7 months ago
smrtinsert|7 months ago
Most definitely not going to happen. Python is the language of the AI age and lot of ML/AI libraries do their reference or first release in Python.
Surac|7 months ago
binarymax|7 months ago
unknown|7 months ago
[deleted]
isodev|7 months ago
poink|7 months ago
I'm a relatively old school lisp fan, but it's hard to do this job for a long time without eventually realizing helping your tools is more valuable than helping yourself
lonelyasacloud|7 months ago
infecto|7 months ago
theusus|7 months ago
hugs|7 months ago
I wrote this [1] comment a few weeks ago:
""" ... Claude Code is surprisingly good at writing Nim. I just created a QuickJS + MicroPython wrapper in Nim with it last week, and it worked great!
Don't let "but the Rust/Go/Python/JavaScript/TypeScript community is bigger!" be the default argument. I see the same logic applied to LLM training data: more code means more training data, so you should only use popular languages. That reasoning suggests less mainstream languages are doomed in the AI era.
But the reality is, if a non-mainstream language is well-documented and mature (Nim's been around for nearly 20 years!), go for it. Modern AI code gen can help fill in the gaps. """
[1]: https://news.ycombinator.com/item?id=44400913
Myrmornis|7 months ago
heavyset_go|7 months ago
physicsguy|7 months ago
As judged by who? And in what field?
I mean, if I look at the big Python libraries I use regularly none of them have types - Django, DRF, NumPy, SciPy, Scikit-learn. That’s not to say there aren’t externally provided stubs but the library authors themselves are often not the ones writing them
teiferer|7 months ago
What dynamically typed languages lack in compile-time safety, the programmer must make up using (automated) testing. With adequate tests, a python program doesn't break more than a Rust or Go program. It's just that people often regard testing as an annoying chore which is the first thing they skip when vibe coding (or "going fast and breaking things" which is then literally what happens).
tonyhart7|7 months ago
but it is tho, You literally can just give LLM to check LSP to analyze early it for you without write test to begin, Their LSP and Compiler is just that smart
gompertz|7 months ago
treve|7 months ago
benreesman|7 months ago
I used to yell at Claude Code when it tried to con me with mocks to get the TODO scratched off, now I laugh at the little bastard when it tries to pull a fast one on -Werror.
Nice try Claude Code, but around here we come to work or we call in sick, so what's it going to be?
jelder|7 months ago
nu11ptr|7 months ago
adamnemecek|7 months ago
jcmontx|7 months ago
seunosewa|7 months ago
itsafarqueue|7 months ago
blurbleblurble|7 months ago
tibastral2|7 months ago
jrvieira|7 months ago
unknown|7 months ago
[deleted]
zk108|7 months ago
Fixed it
lvl155|7 months ago
energy123|7 months ago
Can you explain more why you've arrived at this opinion?
OutOfHere|7 months ago
It's the best at Go imho since it has enforced types and a garbage collector.
Mistletoe|7 months ago
bashtoni|7 months ago
Depending on who you speak to it can be anything from coding only by describing the general idea of what you want, to just being another term for LLM assisted programming.
OutOfHere|7 months ago
In truth, for LLM generated code to be maintainable and scalable, it first needs to be speced-out super well by the engineer in collaboration with the LLM, and then the generated code must also be reviewed line-by-line by the engineer.
There is no room for vibe coding in making things that last and don't immediately get hacked.
shric|7 months ago
fulafel|7 months ago
tldr; fast throwaway code from a LLM, where the human is just looking at the results and not trying to make maintainable code.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
irvingprime|7 months ago
__mharrison__|7 months ago
unknown|7 months ago
[deleted]
leric|7 months ago
[deleted]