IIRC, GPT-4 would actually be a bit _smaller_ to visualize than GPT3. Details are not public, but from the leaks GPT-4 (at least, some by-now old version of it) was a mixture of expert, with every model having around 110B parameters [1]. So, while the total number of parameters is bigger than GPT-3 (1800B vs. 175B), it is "just" 16 copies of a smaller (110B) parameters model. So if you wanted to visualize it in any meaningful way, the plot wouldn't grow bigger - or it would, if you included all different experts, but they are just copies of the same architecture with different parameters, which is not all that useful for visualization purposes.
bugthe0ry|1 year ago
spi|1 year ago
[1] https://medium.com/@daniellefranca96/gpt4-all-details-leaked...
lying4fun|1 year ago