top | item 46737077

(no title)

tyfon | 1 month ago

I've been trying opencode a bit with gemini pro (and claude via those) with a rust project, and I have a push pre-hook to cargo check the code.

The amount of times I have to "yell" at the llm for adding #[allow] statements to silence the linter instead of fixing the code is crazy and when I point it out they go "Oops, you caught me, let me fix it the proper way".

So the tests don't necessarily make them produce proper code.

discuss

order

ASalazarMX|1 month ago

I was doing a somewhat elaborate form/graph in Google Worksheets, had to translate a bunch of cells from English to Spanish, and said "Why not use Gemini for this easy, grunt work? They tend to output good translations".

I spent 20 minutes between guiding it because it was putting the translation in the wrong cells, asking it not to convert the cells to a fancy table, and finally, convincing it that it really had access to alter the document, because at some point it denied it. I wasn't being rude, but it seems I somehow made it evasive.

I had to ask it to translate in the chat, and manually copy-pasted the translations in the proper cells myself. Bonus points because it only translated like ten cells at a time, and truncated the reply with a "More cells translated" message.

I can't imagine how hard it would be to handhold an LLM while working in a complex code base. I guess they are a godsend for prototypes and proofs of concept, but they can't beat a competent engineer yet. It's like that joke where a student answers that 2+2=5, and when questioned, he replies that his strength is speed, not accuracy.

kaydub|1 month ago

This is one of those places I feel like they're trying to do too much with the LLMs and I think this is one of those places where there's "a bubble". I feel like the LLMs are text tools, so trying to take them out of their domain and force them somewhere else you're going to have problems.

Anyways, I replied because I had something else I wanted to say.

I was using Gemini in a google worksheet a while back. I had to cross reference a website and input stuff into a cell. I got Gemini to do it, had it do the first row, then the second, then I told it to do a batch of 10, then 20. It had a hiccup at 20, would take too long I guess. So I had it go back to 10. But then Gemini tells me it can't read my worksheet. I convince it that it can, but then it tells me it can't edit my worksheet. I argue with it, "you've been changing the worksheet wtf?" I convinced it that it could and it started again, but then after doing a couple it told me it couldn't again. We went back and forth a bit, I'd get it working, it would break, repeat. I think it was after the third time I just couldn't get it to do it again.

I looked up the docs, searched online, and I was concerned that I found Google didn't allow Gemini to do a lot of stuff to worksheets/docs/other google workspace stuff. They said they didn't allow it to do a ton of stuff that I definitely had Gemini doing.

Then a week or two went by and google announced they're allowing gemini to directly edit worksheets.

So wtf how did I get it to do it before it could do it???

egeozcan|1 month ago

I added a bunch of lines telling it to never do that in CLAUDE.md and it worked flawlessly.

So I have a different experience with Claude Code, but I'm not trying to say you're holding it wrong, just adding a data point, and then, maybe I got lucky.

ASalazarMX|1 month ago

I'm curious how many of those directives you'll have in that file at the end of the year.

kaydub|1 month ago

Why are you guys having LLMs use git at all???

Manage that yourself! If you have hooks throwing errors then feed the error back into the llm.