top | item 42937928

(no title)

> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

discuss

troupo|1 year ago

> It has a great feature that grabs context from the codebase, I use it all the time.

If only this feature worked consistently, or reliably even half of the time.

It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time

pc86|1 year ago

I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

slig|1 year ago

I'm very interested in your prompt and could you be so kind to paste it somewhere and link in your comment, please?

justneedaname|1 year ago

Also interested to see this

godelski|1 year ago

I have not. But I also can't get the general model to work well in even toy problems.

Here's a simple example with GPT-4o: https://0x0.st/8K3z.png

It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd

To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)

Just for the heck of it, here's the same thing but with o1-preview

Initial problem: https://0x0.st/8K3t.txt

Codegolf one: https://0x0.st/8K3y.txt

As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.

But hey, at least o1 has the correct output... It's just that that's not all that matters.

I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.

[0] https://codegolf.stackexchange.com/questions/215216/high-thr...

Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.