top | item 36675934

GPT-4 details leaked?

661 points| bx376 | 2 years ago |threadreaderapp.com | reply

621 comments

order
[+] CSMastermind|2 years ago|reply
Previously posted about here: https://news.ycombinator.com/item?id=36671588 and here: https://news.ycombinator.com/item?id=36674905

With the original source being: https://www.semianalysis.com/p/gpt-4-architecture-infrastruc...

The twitter guy seems to just be paraphrasing the actual blog post? That's presumably why the tweets are now deleted.

---

The fact that they're using MoE was news to me and very interesting. I'd love to know more details about how they got that to work. Variations in that implementation would explain the fluctuations in the quality of output that people have observed.

I'm still waiting for the release of their vision model which is mentioned here but we still know little about, sans a few demos a few months ago.

[+] xeckr|2 years ago|reply
If this is true, then:

1. Training took 21 yottaflops. When was the last time you saw the yotta- prefix for anything?

2. The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

[+] TeMPOraL|2 years ago|reply
> The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model.

In other words: the speculation was likely right, I'll propose a specific mechanism explaining it, but then still insult the people bringing it up and keep gaslighting them.

[+] mitchdoogle|2 years ago|reply
Calling something a conspiracy theory is not an insult against anybody. It's a theory because it's unproven and it's a conspiracy because people think OpenAI purposely degraded their own service, hence conspiracy theory.
[+] shahules|2 years ago|reply
This guy doesn't have any idea what he is talking about. He consistently posts such bullshit on twitter. Mostly copy paste with added spice mix.
[+] qaq|2 years ago|reply
Hmm “Sam Altman won't tell you that GPT-4 has 220B parameters and is 16-way mixture model with 8 sets of weights” George Hotz said this in his recent interview with Lex Fridman. It looked like Lex knew this to be true by the way he reacted.
[+] npsomaratna|2 years ago|reply
This is unsubstantiated. The only folks who know exactly how GPT-4 works are employed at OpenAI. The rest of us can only guess.
[+] YetAnotherNick|2 years ago|reply
Even if I just go with Sam Altman's public comment, I would have came to similar conclusion: GPT-4 is big and it is hard to make it is faster.

The secret sauce and moat lies in data though. I have heard rumour that they have paid competitive coders to write and annotate code with information like complexity for them.

[+] mmahemoff|2 years ago|reply
I've been wondering how freemium services like Thread Reader still operate now that Twitter is charging prohibitive prices for API access and taking measures to prevent scraping. The cheapest API plan with read access is $100/month, which reads 10,000 tweets, so could only produce about 500 pages like this one on demand.
[+] RC_ITR|2 years ago|reply
For all the 'I know every number' certainty of this post, there's some weird stuff:

>(Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.)

Why flex both system size and training time to arbitrary numbers?

>For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation. This means parts may sit dormant when other parts are being used. When serving users, this really hurts utilization rates.

Utilization of what? Memory? If you're that worried about inference utilization, then why not just fire up a non-MOE model?

Here's what the post said about MQA:

>Because of that only 1 head is needed and memory capacity can be significantly reduced for the KV cache

This is close but wrong. You only need one Key and Value (KV) head, but you still have the same amount of query heads.

My guess is that this is all a relatively knowledgeable person, using formulas laid out by the 2020 scaling paper and making a fantasy system (with the correct math), based on that.

Put differently, I could probably fake my way through a similar post and be an equal level of close but definitely wrong because I'm way out of my league. That vibe makes me very suspicious.

[+] dmarchand90|2 years ago|reply
Can anyone provide an alternative link to https://twitter.com/i/web/status/1678545170508267522

I haven't registered for Twitter since it started and I'd rather not now (though I probably will if it's the only way to get leaked gpt4 training details)

[+] _a9|2 years ago|reply
Wayback failed to load the subtweets but archive.is has a copy but it seems to stop after around 10 subtweets. The threader link that was posted has it all though.

https://archive.is/Y72Gu

[+] Roark66|2 years ago|reply
The tweet is gone. What was in it?

Also, I'm dubious about this unsubstantiated claim. The biggest past innovation (training with human feedback) actually shrunk the size of a model. Compare Bloom-366B with falcon-40B (much better). I would be mildly surprised if it turned out Gpt4 has 1.8T parameters. (even if it's a composite model as they say)

The article says they use 16 experts 111B each. So the best thing to assume is probably that each of these experts is basically a fine tuned version of the same initial model for some problem domain.

[+] getmeinrn|2 years ago|reply
>If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million.

If someone legitimate put together a crowd funding effort, I would donate a non-insignificant amount to train an open model. Has it been tried before?

[+] aussieguy1234|2 years ago|reply
The fact they are using MoE is interesting. There are alot of specialised open source models on HuggingFace. You just need an LLM to act as the core "brain" and a few other components.

HuggingGPT works similar to this. It automatically chooses, downloads and runs the right "expert" model from HuggingFace https://arxiv.org/abs/2303.17580

[+] potatoman22|2 years ago|reply
I wonder what the legal implications of them using SciHub and Libgen would be if that's true. I'd imagine OpenAI is big enough to make deals with publishers.
[+] twayt|2 years ago|reply
Libgen / Scihub or not, if the model can provide details about the book other than just high level info like the summary and no explicit deal with the publisher has been made, you can make a strong argument that it is plagiarism.

Even if bits and pieces of the book text are distributed across the internet and you end up picking up portions of the book, you still read the book.

It is extremely sad but ChatGPT will be taken down by the end of this year and replaced by a highly neutered model next year.

[+] Fiahil|2 years ago|reply
If that's true, then OpenAI has probably taken extreme protective measure to ensure the secret is well protected. Even if OpenAI is big enough to make deals, they probably did not spend several years making deals with all of them.

It's, however, very interesting to see if they fund efforts to massively (re)start books digitalisation.

[+] msp26|2 years ago|reply
probably just easier to use drm-free copies of books
[+] langsoul-com|2 years ago|reply
We should default to using the thread aggregators instead of using twitter links. My God Twitter threads are unreadable.
[+] PostOnce|2 years ago|reply
"Open" AI, a charity to benefit us all by pushing and publishing the frontier of scientific knowledge.

Nevermind, fuckers, actually it's just to take your jobs and make a few VCs richer. We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

https://github.com/ggerganov/llama.cpp

https://github.com/openlm-research/open_llama

https://huggingface.co/TheBloke/open-llama-7b-open-instruct-...

https://huggingface.co/TheBloke/open-llama-13b-open-instruct...

You can use the above without paying OpenAI. You don't even need a GPU. There are no license issues like with the facebook llama.

[+] YeGoblynQueenne|2 years ago|reply
>> We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

Just to be clear, there's no science being kept secret because there is no science being done. OpenAI's is a feat of engineering, borne aloft by a huge budget supporting a large team whose expertise lies in tuning neural net systems, and not in doing science.

Machine learning, as it is practiced today, is not science. There is no scientific theory behind it and there is no scientific method applied. There are no scientific questions asked, or attempted to be answered. There is no new knowledge produced other than how to tune systems to beat benchmarks. The standard machine learning paper is a bunch of text and arcane-looking formulae around a glorified leaderboard: a little table with competing systems on one side and arbitrarily chosen benchmark datasets on the other side; and all our results in bold so everyone knows we're winning. That's as much doing science as is racing cool-looking sports cars.

[+] kyledrake|2 years ago|reply
Unfortunately I've found the current OSS models to be vastly inferior to the OpenAI models. Would love to see someone actually get close to what they can do with GPT-3.5/4, except capable of running on commodity GPUs. What's the most impressive open model so far?
[+] KronisLV|2 years ago|reply
> You can use the above without paying OpenAI. You don't even need a GPU. There are no license issues like with the facebook llama.

I actually wrote about getting an LLM chatbot up and running a while ago: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...

It's good that the technology and models are both available for free, and you don't even need a GPU for it. However, there are still large memory requirements (if you want output that makes sense) and using CPU does result in somewhat slower performance.

There are async use cases where it can make sense, but for something like autocomplete or other near real time situations we're not there yet. Nor is the quality of the models comparable to some of the commercial offerings, at least not yet.

So I don't have it in me to blame anyone who forks over the money to a SaaS platform instead of getting a good GPU/RAM and hosting stuff themselves.

Here's hoping the more open options keep getting better and more competitive, though!

[+] grondilu|2 years ago|reply
> try to pressure the government into making it illegal for you to compete with us.

I mean the guy who created GPT-4 literally demanded a ban of any system more powerful than GPT-4.

[+] TradingPlaces|2 years ago|reply
Keep in mind, that was the idea originally. Then in 2018 Elon decided he wanted to be CEO, the board rejected that idea for now obvious reasons, and he reneged on 90% of a promised $1b donation. The only way forward was to become a for-profit company and do a normal funding round, which is what happened with Microsoft. Elon’s rug pull is why this happened.
[+] arthur_sav|2 years ago|reply
I would love to use these... but they suck. They don't even come close to what OpenAI offers.
[+] tudorw|2 years ago|reply
Sorry about the link.in style links but I just posted this there and came here and felt it might be interesting to someone, they do work!

...

Here's a round up of open source projects focused on allowing you to run your own model's locally ('AI'), they all take slightly different approaches although under the hood many use the same models.

https://lnkd.in/exKqJZm8 A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.

https://lnkd.in/etVCmZHB With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. State-of-the-art LLMs: built-in supports a wide range of open-source LLMs and model runtime, including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.

https://lnkd.in/e7-NKGzJ LocalAI is a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.

https://lnkd.in/ef_Sa9AN Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer

https://lnkd.in/e288q-Wb A desktop app for local, private, secured AI experimentation. Included out-of-the box are: A known-good model API and a model downloader, with descriptions such as recommended hardware specs, model license, blake3/sha256 hashes etc... A simple note-taking app, with inference config PER note. The note and its config are output into plain text .mdx format A model inference streaming server (/completion endpoint, similar to OpenAI)

https://lnkd.in/eycRJn6b Transcribe and translate audio offline on your personal computer. Powered by OpenAI's Whisper.

https://lnkd.in/eUrtE3uQ The easiest way to install and use Stable Diffusion on your computer. Does not require technical knowledge, does not require pre-installed software. 1-click install, powerful features, friendly community.

[+] satvikpendem|2 years ago|reply
See also, Orca and Falcon models which are also open source. I'm not sure if any frontends support them yet.
[+] lynx23|2 years ago|reply
Wow, the top comment is neither relevant to the post, nor friendly or interesting. Activism, even with false premises. Many of us tried, and those with a little sense left know that running your local LLM on a non-GPU is not really useful.

Besides, what does your post add to the discussion, and why is it the top posting?

Create your local LLM, use it, tell other people about how you did it exactly, and be happy. But why the heck do you need to fight a company in that space?

Wehre have the times gone when someone motivated to do something nice just went ahead and did it, instead of running in circles and telling everyone else what they should NOT do.

[+] 29athrowaway|2 years ago|reply
If OpenAI is open, the Congo Free State was a free state.
[+] charcircuit|2 years ago|reply
>There are no license issues like with the facebook llama.

OpenLLaMa uses a dataset which does not seem to have gotten propper commercial licensing for the training data. There is potential licensing issues because the copyright situation is not well defended.

[+] carabiner|2 years ago|reply
We must destroy OpenAI!
[+] izktj|2 years ago|reply
The only people pissed are a bunch of developers who want to use it for their own good.

GPT4 costs are ridiculously cheap for the value you get out of it. Any other company wouldn’t even release it to the public like they’ve done

[+] pyeri|2 years ago|reply
In today's world, "Science equals Capitalism".

Or at least Science is allowed to progress and get funded only as long as it serves the interest of Capitalism.

[+] michaelcampbell|2 years ago|reply
> We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

This is essentially the Capitalist Credo, expressed in practical vs theoretical terms.

[+] ChildOfChaos|2 years ago|reply
You forgot to link where I can buy your tin foil hat at the end.
[+] imdsm|2 years ago|reply
> "Open" AI, a charity to benefit us all by pushing and publishing the frontier of scientific knowledge. > Nevermind, fuckers, actually it's just to take your jobs and make a few VCs richer. We'll keep the science a secret and try to pressure the government into making it illegal for you to compete with us.

1. I don't think this is the right place for this kind of content, perhaps find your way back to Twitter or Reddit

2. Have you contributed funds to OpenAI? If not, where did your sense of entitlement come from?

3. What makes you think that any of what OpenAI has produced and provided would be available without funding? I assume the answer to 1 above will be no, so, how do you expect them to build without funds?

and 4. What's stopping you from creating what you thought OpenAI should be? Feel free. Nobody stopping you.

[+] BoorishBears|2 years ago|reply
Why the vitriol towards OpenAI?

If Elon hadn't pulled the rug out from under them after they refused his forceful takeover*, they wouldn't have had to go to Microsoft and they'd still be open.

* a takeover which he predicated on the claim that OpenAI was "doomed to fail"

[+] qwertox|2 years ago|reply
> This, of course, is “only” a batch size of 7.5 million tokens per expert due to not every expert seeing all tokens.

> Mixture of Expert Tradeoffs: There are multiple MoE tradeoffs taken: For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation.

Are these experts able to communicate among them in one query? How do they get selected? How do they know who to pass information to?

Would I be able to influence the selection of experts by how I create my questions? For example to ensure that a question about code gets passed directly to an expert in code? I feel silly asking this question, but I honestly have no idea how to interpret this.

[+] l33tman|2 years ago|reply
You shouldn't take the "mixture of experts" too literally, it's yet another architecture to use internally for a gradient descent optimized graph of ops.

I obviously don't know how GPT-4 do it (or if it even does it) but think of partitioning your network into a couple of very isolated sub-graphs (the "experts"), and add another learnable network between the input tokens and the experts, that learns to route tokens to 1 or more expert sub-graphs. Then the gain is that you can potentially ignore running the unused sub-graphs completely for that token, and you can distribute them on other GPUs as except for the input and output they are independent of each other.

It all depends on the problem, data, and if the gradient descent optimizer can find a way to actually partition the problem usefully using the router and "experts".