(no title)
Julien_r2 | 4 months ago
So far it's just reinforcing my feeling that none of this is actually used at scale.. We use AI as relatively dumb companions, let them go wilder on side projects which have loser constraints, and Agent are pure hype (or for very niche use cases)
laborcontract|4 months ago
Unlike the model providers, Cursor has to pay the retail price for LLM usage. They're fighting an ugly marginal price war. If you're paying more for inference than your competitors, you have to choose to either 1) deliver equal performance as other models at a loss or 2) economize by way of feeding smaller contexts to the model providers.
Cursor is not transparent on how it handles context. From my experience, it's clear that they use aggressive strategies to prune conversations to the extent that it's not uncommon that cursor has to reference the same file multiple times in the same conversation just to know what's going on.
My advice to anyone using Cursor is to just stop wasting your time. The code it generates creates so much debt. I've moved on to Codex and Claude and I couldn't be happier.
addandsubtract|4 months ago
zwnow|4 months ago
Huppie|4 months ago
There are moments where spending 10 min on a good prompt saves me 2hrs of typing and it finishes that in the time it takes me to go make myself a cup of coffee (~10 min) Those are the good moments.
Then there are moments where it's more like 30 min savings for 10 min of prompting. Those are still pretty good.
Then there are plenty of moments where spending 10 mins on a prompt saves me about 15mins of work. But I have to wait 5 mins for the result, so it ends up being a wash except it has a downside that I didn't really write it myself so the actual details of the solution aren't fully internalized.
There's also plenty of moments where the result at first glance looks like a good / great result but once I start reviewing and fixing things it still ends up being a wash.
I find it actually quite difficult to determine the result quality because at first glance it always looks pretty decent, and then sometimes once you start reviewing it's indeed the case and other times I'm like "well it needs some tweaking" and subsequently spend an hour tweaking.
Now I think the problem is that the response is akin to gambling / conditioning in a sense. Every prompt has a smallish chance to trigger a great result, and since the average result is still about 25% faster (my gut feeling based on what I've 'written' the last few months working with Claude Code) it's just very tempting to pull that slot machine lever even in tasks that I know I will most likely type faster than I can prompt.
I did find a place where (to me, at least) it almost certainly adds value: I find it difficult to think about code during meetings (I really need my attention in the meetings I do) but I can send a few quick prompts for small stuff during meetings and don't really have to context switch. This alone is a decent productivity booster. Refactorings that would've been a 'maybe, one day' can now just be triggered. Best case I spend 10 minutes reviewing and accept it. Worst case I just throw it away.
Zababa|4 months ago
I derive a lot of business value from them, many of my colleagues do too. Many programmers that were good at writing code by hand are having lots of success with them, for example Thorsten Ball, Simon Willison, Mitchell Hashimoto. A recent example from Mitchell Hashimoto: https://mitchellh.com/writing/non-trivial-vibing.
>Its puzzling to me how people actually think this is a net benefit to humanity.
I've used them personally to quickly spin up a microblog where I could post my travel pictures and thoughts. The idea of making the interface like twitter (since that's what I use and know) was from me, not wanting to expose my family and friends to any specific predatory platform like twitter, instagram, etc was also from me, supabase as the backend was from a colleague (helped a lot!), the code was all Claude. The result is that they were able to enjoy my website, including my grandparents that just had to paste an URL on the website. I like to think of it a a perhaps very small but net benefit for a very small part of humanity.
sdoering|4 months ago
They came in primed against agentic work flow. That is fine. But they also came in without providing anything that might have given other people the chance to show that their initial assumptions was flawed.
I've been working with agents daily for several months. Still learning what fails and what works reliably.
Key insights from my experience: - You need a framework (like agent-os or similar) to orchestrate agents effectively - Balance between guidance and autonomy matters - Planning is crucial, especially for legacy codebases
Recent example: Hit a wall with a legacy system where I kept maxing out the context window with essential background info. After compaction, the agent would lose critical knowledge and repeat previous mistakes.
Solution that worked: - Structured the problem properly - Documented each learning/discovery systematically - Created specialized sub-agents for specific tasks (keeps context windows manageable)
Only then could the agent actually help navigate that mess of legacy code.
rafaelmn|4 months ago
My experience is that once I switch to this mode when something blows up I'm basically stuck with a bunch of code that I sort of know, even tough I reviewed it. I just don't have the same insight as I would if I wrote the code, no ownership, even if it was committed in my name. Like any misconceptions I've had about how things work I will still have because I never had to work through the solution, even if I got the final working solution.
tossandthrow|4 months ago
hitarpetar|4 months ago
lxgr|4 months ago
Of course there are many more bugs they'll currently not find, but when this strategy costs next to nothing (compared to a SWE spending an hour spelunking) and still works sometimes, the trade-off looks pretty good to me.
alwahi|4 months ago
jpalomaki|4 months ago
wrsh07|4 months ago
I'll usually have a main line of work I'm focused on. I'll describe the current behavior and desired changes (need to plumb this var through these functions to use here). "Gpt 5 thinking high" is pretty precise, so if you clearly indicate what you want it usually does exactly what I request. (If this isn't happening for you, make sure you don't have other context in your codebase that confuses it)
While it's working, I'll often be able to prompt another line of work, usually requesting explicitly it not make changes but not switching to ask mode. It will do most of the work to figure out what changes would need to be made and it summarizes them helpfully which allows me to correct it if it's wrong. You can repeat this for as long as the existing models are busy
Types of prompts that work well:
Questions: "what's the function or component for doing X", where else do we do this pattern?
Bug prompts (anything that would take you <2h to fix should be promptable in a single prompt, note you'll get slightly different responses even with the same prompt, so if at first you don't succeed you might explain what went wrong, ask it to improve your prompt, and then try again from scratch. People don't reset context often enough)
Larger scale architecture / plans - this I would recommend switching to plan mode and spending some time going back and forth. Often it will get confused so take your progress (ideally as an .md file) and bring it to a new conversation to keep iterating.
You can even have it suggest jira tickets etc
Understanding different models is important: Claude 4.5 (and most Claude models since 3.5) really want to do stuff. And if you leave them unchecked they'll usually do way more than you asked. And if they perceive themselves to be blocked on a failing test they might delete it or change it to be useless. That said, they're really extraordinary models when you want a quick prototype fleshed out where you don't make all of the decisions. Gpt 5 thinking high is my personal favorite (codex 5 thinking high is also very good in the codex plugin in vscode). Create new context often.
wrsh07|4 months ago
Best things about gpt: the precision. I don't even care that they're slow, it just let's me queue up more work
Best things about codex: it's a little smarter at handling very hard or very easy tasks. It might spend less time on easy tasks and even more time on hard ones
Best things about grok: speed plus leetcode style ability
All of them tend to benefit from a feedback loop if you can give them great tests or good static analysis etc, but they will cheat if you let them (any in ts)
theshrike79|4 months ago
Codex + GPT-5-high is an offshore consultant. You give it the spec and it'll do the work and come back with something.
Claude is built like a pair programmer, it chats while it works and you can easily interrupt it without breaking the flow.
Codex is clearly more thorough, it's _excellent_ at picking apart Sonnet 4.5 code and finding the subtle gotchas it leaves behind when it just plows to a result.
And like you said, Claude is results first. It'll get where you want it to go, even if it has to mock the whole application to get the tests to pass. =)
subjectivationx|4 months ago
I am using language models as much as anyone and they work but they don't work the way the marketing and popular delusion behind them is pretending they work.
The best book on LLMs and agentic AI is Extraordinary Popular Delusions and the Madness of Crowds by Charles Mackay.