top | item 40806868

(no title)

m0zzie | 1 year ago

Can anyone comment on its coding ability?

Considering cancelling my subscription with OpenAI as I was previously using GPT-4 quite heavily as a multiplier for myself, guiding it and editing outputs as required, but GPT-4o feels significantly worse for this use case. It is certainly better in many other areas, but its coding ability is not great.

I tried to revert back to standard GPT-4 but it is now so slow to respond (higher load?) that it breaks my mental flow, so I'm exploring other options.

discuss

order

r2_pilot|1 year ago

The first night it was released, it one-shotted merging two Python files together (one was an incomplete mess of speech to text, the other was an also-incomplete Claude api demo with a fake loop with forced conversation paths). It rewrote the fake loop and turned it into a voice interface that worked on the first try(then I asked it to add text to speech and it did). The second night it improves the code by adding threads (which I do know about but wouldn't have thought to use it there) and it worked better. Then Linux decided not to work properly (pulseaudio and ALSA issues), I asked it to help me migrate to pipewire and it gave me step by step directions, when I had issues all I had to do was post the error and it fixed it enough that it's all working again. It's been a pretty amazing week from my perspective. I finally had to take a break earlier this week but I still use Claude 3.5 for gardening (taking pictures and asking for help) and general advice.

egl2021|1 year ago

Would you please describe your setup for coding? I'm interested in trying this out, but I need an ELI5. Thanks.

PS Anybody is welcome to jump in with a description of their setup.

8n4vidtmkvmk|1 year ago

Gardening? The vision stuff for plants has always been terrible guesswork. Is it actually good/useful now?

thomasfromcdnjs|1 year ago

I was trying to get it to create an LLM from scratch using tensorflow.js with training and prediction, it got about 95% of the way, impressive.

But gpt-4o actually got a working solution in a couple prompts -> https://gist.github.com/thomasdavis/fadbca01605fb3cb64911077...

Though the new artefacts is really damn handy, you can describe the most detailed projects, and it does a really great job of what you asked for, and I found it delivered beyond what I wanted.

I am now paying for both -.-

- it's hard to rank which codes better, but I think claude has better abstractions

- sometimes I paste the output of the other, and continue solving on the other

girvo|1 year ago

Yo! Long time no see haha <3

Would love to see your workflow sometime, my experimentations have been small tbh but I was quite impressed with Claude.

Funnily, I think (modulo costs) feeding outputs of one model into another once you hit a brick wall is quite powerful, but feels clunky to do at the moment

eschluntz|1 year ago

(I'm biased as I work for anthropic)

I broke my hand 2 months ago and Claude 3.5 sonnet has been writing all my code for me. It's been awesome

porl|1 year ago

Since you work there, can you ask that they are more transparent with exactly what you are paying for with a "pro" subscription?

Currently you are apparently paying for:

- Everything in Free - Use Claude 3 Opus and Haiku - Higher usage limits versus Free - Create Projects to work with Claude around a set of docs, code, or files - Priority bandwidth and availability - Early access to new features

But what are the usage limits? Higher than free by how much?

Having an invisible limit on a paid product really rubs me the wrong way. Maybe some rate-limiting after a certain amount would be better than a hard cutoff, but even then I'd like to know what the limit is before I pay, not when I accidentally hit it in the middle of something important.

mertbio|1 year ago

Why do you work? Can't you get a sick leave?

mauflows|1 year ago

Mind explaining your workflow? I actually just broke my wrist today :(

valval|1 year ago

I saw the initial 3.5 Sonnet release a few days ago and decided to pull the trigger with canceling OpenAI and moving over, and that was a great decision. I've been having to do more and more intricate nudging and leading towards the correct answer or approach when using ChatGPT of late.

My experience with Claude is very positive when it comes to programming and planning out infrastructure. My only gripe so far has been some ethical constraints that didn't exist for ChatGPT, and those are a big one for me since I don't need Anthropic employees to act as my moral compass. For a specific example, asking about securing code through making decompiling or reading harder is a no-no for Claude, but a-ok for GPT.

jampa|1 year ago

You can use OpenAI's developer platform https://platform.openai.com/playground/chat

I haven't used the normal website in a year, it is a way better experience:

- It allows tweaking of the model / temperature.

- You pay only per token used

- It does not use your conversations to train their models

- You can edit both your and the assistant message in the conversation or delete it.

- Allows for easy message replay if the output is not great

Be mindful that you are paying per token for every message in the thread, so if the thread becomes long it gets expensive, but still far more value than subscribing.

panarky|1 year ago

The world is changing too fast for knowledge that is nine months old.

------

USER

What is the maximum number of tokens supported by gpt-4o?

ASSISTANT

As of my most recent update in October 2023, GPT-4 does not have an officially designated variant called "gpt-4o". However, OpenAI's GPT-4 model supports different maximum token limits depending on its variant. There are two primary context lengths for GPT-4 models:

1. *GPT-4 with 8K context length*: This variant supports up to 8,000 tokens. 2. *GPT-4 with 32K context length*: This variant supports up to 32,000 tokens.

These limits encompass both input and output tokens in a given interaction. If you intended to refer to a specific or newly introduced variant, such as "gpt-4o," it is essential to look up the most current resources or official announcements from OpenAI, as models and their capabilities can evolve.

cyral|1 year ago

I've been playing around with it this week and its coding ability is insane (for a LLM). I've given it some pretty sloppy descriptions about things I want to do and it's managed to figure out exactly how to do it on the first or second try, I'm talking things like building animations in React that cannot be described with text very well. Big pain point is copy and pasting things back and forth to have it edit them. If it was integrated and could see my local files, that would be killer. I know there are various companies working on that, but the jetbrains AI integration for example is garbage compared to the results I get by manually asking claude.

I wasn't worried about how this would affect our industry a few months ago, but this has me reconsidering. It's like a junior engineer that can do most tasks in seconds for a couple of cents.

hdhshdhshdjd|1 year ago

What worries me is you need that time in the dirt to get a feel for coding as a craft. And at least for me that aspect of knowing the craft helps get my thinking in tune with problem solving in a very productive way.

Coding can be similar to playing an instrument, if you have mastery, it can help you be more expressive with the ideas you already have and lead you to new ones.

Whereas if we take away the craft of coding I think you end up with the type of code academic labs produce: something that purely starts on a “drawing board”, is given to the grad student/intern/LLM to make work, and while it will prove the concept it won’t scale into long term, as the intern doesn’t know when to spend an extra 30 minutes in a function so that it may be more flexible down the road.

ttul|1 year ago

I’d say it’s far more useful than a junior engineer. A junior might know one or two areas well, but Claude 3.5 knows literally every steep learning curve across all engineering professions. Claude may not yet be able to orchestrate a large project, but it can advise on structuring using knowledge drawn from best practices across every conceivable discipline. We may not get “AGI”; we might get something far more useful instead.

ChicagoDave|1 year ago

I switched to Claude when Opus came out and have never looked back.

Anthropic tech is vastly better at code.

ChicagoDave|1 year ago

Also, Anthropic Projects is HUGE.

You get to create a set of default instructions that all project prompts to adhere to, like: Use Linq whenever possible. Consider Fluent capabilities when it will improve developer experience.

Then you can add all of your existing code (though as has been pointed out, connecting to a repo would be ideal).

And then you can have multiple conversations within that project, so you can separate your lines of thinking.

mbil|1 year ago

I haven’t had a great experience using GPT-4o for coding; it will sometimes make a mistake, fix the mistake introducing another mistake, then fix that second mistake repeating the first. Or other weird things like that. I love the text to speech, and image generation sometimes comes in handy, but ultimately I prefer Sonnet 3.5 for coding and have recently switched to that.

If OpenAI comes out with a stronger model I’ll prob switch back, I don’t have much brand loyalty in this regard. I could see some features making usage more sticky (memory, projects, etc).

bredren|1 year ago

What languages / frameworks do you focus on?

I’ve also had worse results with 4o than 4. It’s faster but it vomits output.

I can’t tell if I need to go tweak memory or if it’s just like this.

wesleyyue|1 year ago

I've been using it in https://double.bot (vscode coding copilot I'm hacking on) and it's been pretty great so far. We have support for all the top models (gpt4o, deep seek, opus, etc) so I regularly like to switch between them especially if I see a challenging problem. So far, sonnet 3.5 has the best quality and is the one I use by default.

namanyayg|1 year ago

How does double.bot compare against cursor.sh? I've been using the latter for a while after switching from copilot++ and am pleased.

skybrian|1 year ago

I switched from a GitHub Copilot subscription to Cody for VS Code, to give the competition a chance.

Cody's chat UI seems pretty good for making comparisons like this. You can set it to whichever LLM you want, including GPT-4o and Claude 3.5 Sonnet.

I haven't used Claude enough to do to a detailed comparison, but GPT4o and Claude 3.5 seem roughly similar for my coding questions.

stephc_int13|1 year ago

I use it mostly for coding help, not Python or JS but a mix of C and C++ for my own framework.

It is surprisingly good and helpful. I am still exploring the limits.

Writing doc comments or test cases is much faster and more fun with this kind of tool, but you still have to double check everything as it inevitably make mistakes, often small and not obvious…

andrewstuart|1 year ago

>> GPT-4o feels significantly worse for this use case

Interesting. I felt GPT4 was virtually useless and GPT3.5 was the best, then came along GPT-4o and it instantly became the only version worth using.

I find GPT-4o to be extremely powerful and useful.

What don't you like about it?

m0zzie|1 year ago

To clarify, I think it's extremely powerful and useful too. It's just that I get more misses than hits when using it as a rubber duck and coding assistant.

netsec_burn|1 year ago

Hit or miss, with significantly more hits than misses. I was able to build a useful cybersecurity service in about 6 hours leveraging Claude to write most of the code. It has certainly made me more productive.

energy123|1 year ago

It's slightly higher ranked than 4o on the lmsys leaderboard, Coding section. But the difference is not big.

replwoacause|1 year ago

I find it’s coding ability superior to GPT4 and 4o

ldjkfkdsjnv|1 year ago

Coding ability is off the charts, to the degree that I think in one year software engineering will be considered a dying field.

newzisforsukas|1 year ago

What will the people who use LLMs to write code be called? What will the field be called?

Where will all the power and computers come from that will replace the tens of millions of software developers?

Moldoteck|1 year ago

and who would actually verify the output of llm works as expected? Unit tests? Who would verify unit tests are testing what needs to be tested instead of just a branch/output coverage? I've used these tools personally and I've used gh copilot at company I work in and at no point in time I was under impression llm's can do all of my work. Each step needs verification if it actually gives what you need, some steps do need good description by someone who knows what the system is doing to perform the changes where these are needed. Continuous extension of features without good design patterns in place, usually checked by devs, can also create a lot of problems. So what makes you think in one year swe will be considered a dying field? What I see is the need for better devs that can see beyond boilerplate that can be empowered by llm's, but it doesn't mean a dying field, it means rising the standards to break into the field.

shinycode|1 year ago

Yeah good idea to put everyone out of their job and give it to 2 companies. Then what happens when those companies decide to rise prices by x1000 (twitter api ?). Or what happens when your system has a bug and does not work and the AI says it’s all good ? Who will introspect your code to find what’s wrong ?

mirsadm|1 year ago

Here we go again. If you seriously think that then you should probably consider changing careers.