top | item 41999862

(no title)

joshhart | 1 year ago

The benchmarks compare it favorably to GPT-4-turbo but not GPT-4o. The latest versions of GPT-4o are much higher in quality than GPT-4-turbo. The HN title here does not reflect what the article is saying.

That said the conclusion that it's a good model for cheap is true. I just would be hesitant to say it's a great model.

discuss

A_D_E_P_T|1 year ago

Not only do I completely agree, I've been playing around with both of them for the past 30 minutes and my impression is that GPT-4o is significantly better across the board. It's faster, it's a better writer, it's more insightful, it has a much broader knowledgebase, etc.

What's more, DeepSeek doesn't seem capable of handling image uploads. I got an error every time. ("No text extracted from attachment.") It claims to be able to handle images, but it's just not working for me.

When it comes to math, the two seem roughly equivalent.

DeepSeek is, however, politically neutral in an interesting way. Whereas GPT-4o will take strong moral stances, DeepSeek is an impressively blank tool that seems to have no strong opinions of its own. I tested them both on a 1910 article critiquing women's suffrage, asking for a review of the article and a rewritten modernized version; GPT-4o recoiled, DeepSeek treated the task as business as usual.

tkgally|1 year ago

> DeepSeek ... seems to have no strong opinions of its own.

Have you tried asking it about Tibetan sovereignty, the Tiananmen massacre, or the role of the communist party in Chinese society? Chinese models I've tested have had quite strong opinions about such questions.

theanonymousone|1 year ago

Thanks for sharing. How about 4o-mini?

mvdtnz|1 year ago

If OpenAI wants fairer headlines they should use a less stupid version naming convention.

jchook|1 year ago

I updated the title to say GPT-4, but I believe the quality is still surprisingly close to 4o.

On HumanEval, I see 90.2 for GPT-4o and 89.0 for DeepSeek v2.5.

- https://blog.getbind.co/2024/09/19/deepseek-2-5-how-does-it-...

- https://paperswithcode.com/sota/code-generation-on-humaneval

selfhoster11|1 year ago

I am extremely sceptical about the claim that any version of GPT-4o meets or exceeds GPT-4 Turbo across the board.

Having used the full GPT-4, GPT-4 Turbo and GPT-4o for text-only tasks, my experience is that this is roughly the order of their capability from most to least capable. In image capabilities, it’s a different story - GPT-4o unquestionably wins there. Not every task is an image task, though.

stefan_|1 year ago

Begging for the day most comments on a random GPT topic will not be "but the new GPT $X is a total game changer and much higher in quality". Seriously, we went through this with 2, 3, 4.. incremental progress does not a game changer make.

selfhoster11|1 year ago

I'm sorry, but I gotta defend GPT-4o image capabilities on this one. It's leagues ahead of competition on this, even if text-only it's absolutely horrid.

GaggiX|1 year ago

The table only shows the models that they managed to beat, so there is no GPT-4o or Claude 3.5 Sonnet for example.