top | item 35197331

(no title)

This document doesn't contain the architecture and training details of GPT-4. As an engineer, these details would be the most interesting part of it!

Driven by interest in GPT-4 and cutting edge LLMs I studied the research literature and compiled a small list of architectural and training details which very likely underpin GPT-4 in this blogpost: https://kir-gadjello.github.io/posts/gpt4-some-technical-hyp...

While this is a work in progress, the most important part is already in place and thus I decided to publish it in its current draft state.

Have fun following the TLDR and Arxiv links, fellow HNers!

discuss

adt|3 years ago

This is great, thanks, Kirill!

I've added your hypothesis to these ones:

https://lifearchitect.ai/gpt-4/

There's quite a broad range of guesses going on. I lean towards 80B language + 20B vision params trained across 3T collected tokens (could repeat to 10T+), but one of the other (strong) hypotheses is a dense 7T param model. That's absurd...

kir-gadjello|3 years ago

That's cool, thanks for noting, Alan!

Would you mind adding a reference link to the source, so that other people could visit my blog? I'm just starting out with blogging, it would help me to get more readers and feedback on this draft. I hope to get it in much better shape in just a few days.