Are you guys affiliated with Meta’s ex-CTO in any way? I remember he famously implied that LLMs hyped. The demos are very impressive. Does this use an attention based mechanism too? Just trying to understand (as a layman) how these models handle context and if long contexts lead to weaker results. Could be catastrophic in the real world!
sheepscreek|4 days ago
Or make something like LoRA mainstream for everyone (probably scales better for general use models shared by everyone).