hnthrowaway9812's comments

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

Thanks for this spreadsheet! It's amazing!

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

It's a wast if they are mostly all trying the SAME things. Which is mostly what is happening.

I want someone to spend a million on a Chess LLM so we can get a sense of how sophisticated they can get at non-linguistic pattern matching.

I want someone to spend a million on an LLM trained on Python program traces so we can try to teach it cause and effect and "debugging". Maybe it will emulate a Python interpreter and get highly reliable at predicting the outcome of Python code.

etc.

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

It claims to have a knowledge cut-off of 2021. Not sure if its hallucinating or its true.

But when I asked it about the best LLMs it suggested GPT-3, Bert and T5!

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

It doesn't seem like that's true at all.

If the "best model" only stays the best for a few months and if, during those few months, the second best model is near indistinguishable, then it will be extremely hard to extract trillions of dollars.

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

Groq seems to be well positioned to give Nvidia a run for their money, actually.

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

Snowflake could have the same story by hosting Llama 3 which is probably more efficient/better.

hnthrowaway9812 | 1 year ago | on: Snowflake Arctic Instruct (128x3B MoE), largest open source model

You've nerdsniped me so hard that I had to make an account.

There are DOZENS of orgs releasing foundational models, not "a handful."

Salesforce, EleuthierAI, NVIDIA, Amazon, Stanford, RedPajama, Cohere, Mistral, MosaicML, Yandex, Huawei StabilityLM, ...

https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...

It's completely bonkers and a huge waste of resources. Most of them will see barely any use at all.