top | item 46182168

(no title)

mapmeld | 2 months ago

Well it's cool that they released a paper, but at this point it's been 11 months and you can't download a Titans-architecture model code or weights anywhere. That would put a lot of companies up ahead of them (Meta's Llama, Qwen, DeepSeek). Closest you can get is an unofficial implementation of the paper https://github.com/lucidrains/titans-pytorch

discuss

order

alyxya|2 months ago

The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.

tyre|2 months ago

Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.

You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?

p1esk|2 months ago

Until google puts in a lot of resources into training a scaled up version of this architecture

If Google is not willing to scale it up, then why would anyone else?

nickpsecurity|2 months ago

But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.

So, I think they could default on doing it for small demonstrators.

m101|2 months ago

Prove it beats models of different architectures trained under identical limited resources?

UltraSane|2 months ago

Yes. The path dependence for current attention based LLMs is enormous.

root_axis|2 months ago

I don't think the comparison is valid. Releasing code and weights for an architecture that is widely known is a lot different than releasing research about an architecture that could mitigate fundamental problems that are common to all LLM products.

innagadadavida|2 months ago

Just keep in mind it is performance review time for all the tech companies. Their promotion of these seems to be directly correlated with that event.

mupuff1234|2 months ago

> it's been 11 months

Is that supposed to be a long time? Seems fair that companies don't rush to open up their models.

informal007|2 months ago

I don't think model code is a big deal compared to the idea. If public can recognize the value of idea 11 months ago, they could implement the code quickly because there are so much smart engineers in AI field.

jstummbillig|2 months ago

If that is true, does it follow this idea does not actually have a lot of value?

mapmeld|2 months ago

Well we have the idea and the next best thing to official code, but if this was a big revelation where are all of the Titan models? If this were public, I think we'd have a few attempts at variants (all of the Mamba SSMs, etc.) and get a better sense if this is valuable or not.

AugSun|2 months ago

Gemini 3 _is_ that architecture.

FpUser|2 months ago

I've read many very positive reviews about Gemini 3. I tried using it including Pro and to me it looks very inferior to ChatGPT. What was very interesting though was when I caught it bullshitting me I called its BS and Gemini expressed very human like behavior. It did try to weasel its way out, degenerated down to "true Scotsman" level but finally admitted that it was full of it. this is kind of impressive / scary.