Can you please explain to me why AI chips don't matters anymore because of DeepSeek? I thought it was just a better model, but perhaps I didn't get it?
Deepseek used older generation chips and developed a model that takes significantly less compute. Making having access to tons of the latest nvidia hardware unnecessary.
Being forced to live with more HW restrictions usually results in more reliance on SW creativity and better optimizations instead of lazy developers bloating SW to fill all available resources.
Just like how it's no surprise that websites developed where everyone has the latest and grates fully loaded M silicon MacBooks also sufferer from horrible lack of optimizations because "it works on my machine" while being a stuttery mess everywhere else.
But there has been a long term suspicion in the AI community that the ultra expensive to compute and very expensive to run humongous LLM approach is a dead end, or at lest fully unnecessary (and as such monetary wise a dead end).
I mean think about it, the target crown jewel of AI was never to find ways to train on insane amounts of data, but to be able to get as good as possible results with only as much data as necessary but no more. Because for a lot of use cases there simply isn't that much data.
And from everything we know the structure of language is not so complex that you need this insane amount of data and model size.
It's just we worked around of problems by throwing more compute and data on it instead of solving them proper. Similar we try to reformulate any little-data use case by reformulating it in a way where we hope to take advantage of the mass "causal text" data modern foundational LLMs where trained on and fine tune and instrument the model using the "little data" of the use case.
But conceptually this is ... sub-par and non desirable. And sure that we made it work with this trickery is quite magnificent.
And sure this huge LLMs do more then encode language, they encode miscellaneous knowledge/data, too.
But a messy, hallucination prone, non properly updateable and potentially outright copyright or privacy law violating encoding of data...
So many systems already do use RAG like approaches to get supply the knowledge in a updateable much more well defined fore and "only" use the LLM to find the right search queries and combine things together into human readable responses.
In turn the moment we have small LLMs which still work well for language structure they likely will very reliable win through a lot of reasons (the ones mentioned above and they are also much cheaper) and that even through they are _way_ more complicated to use then "just prompting a LLM". But most advanced assistants are anyway already way more complicated then "just prompting a LLM".
Or in other words the technical breakthrough anyone (including OpenAI) would like the most (OpenAI: financially, as long as it's an internal secret) is one which eliminates the need for having the latest bleeding edge ML chip tech. And DeepSeek is seen by some as a signal that exactly such a change is going to happen. Also I have heard rumors (which I don't believe) that one reason for OpenAI to go non-open was because they realized that, too. And with cheap to run open models they would lose the competitive benefit of competition not being able to do from scratch training even if they want to.
Gigachad|1 year ago
lostmsu|1 year ago
Cumpiler69|1 year ago
Just like how it's no surprise that websites developed where everyone has the latest and grates fully loaded M silicon MacBooks also sufferer from horrible lack of optimizations because "it works on my machine" while being a stuttery mess everywhere else.
dathinab|1 year ago
But there has been a long term suspicion in the AI community that the ultra expensive to compute and very expensive to run humongous LLM approach is a dead end, or at lest fully unnecessary (and as such monetary wise a dead end).
I mean think about it, the target crown jewel of AI was never to find ways to train on insane amounts of data, but to be able to get as good as possible results with only as much data as necessary but no more. Because for a lot of use cases there simply isn't that much data.
And from everything we know the structure of language is not so complex that you need this insane amount of data and model size.
It's just we worked around of problems by throwing more compute and data on it instead of solving them proper. Similar we try to reformulate any little-data use case by reformulating it in a way where we hope to take advantage of the mass "causal text" data modern foundational LLMs where trained on and fine tune and instrument the model using the "little data" of the use case.
But conceptually this is ... sub-par and non desirable. And sure that we made it work with this trickery is quite magnificent.
And sure this huge LLMs do more then encode language, they encode miscellaneous knowledge/data, too.
But a messy, hallucination prone, non properly updateable and potentially outright copyright or privacy law violating encoding of data...
So many systems already do use RAG like approaches to get supply the knowledge in a updateable much more well defined fore and "only" use the LLM to find the right search queries and combine things together into human readable responses.
In turn the moment we have small LLMs which still work well for language structure they likely will very reliable win through a lot of reasons (the ones mentioned above and they are also much cheaper) and that even through they are _way_ more complicated to use then "just prompting a LLM". But most advanced assistants are anyway already way more complicated then "just prompting a LLM".
Or in other words the technical breakthrough anyone (including OpenAI) would like the most (OpenAI: financially, as long as it's an internal secret) is one which eliminates the need for having the latest bleeding edge ML chip tech. And DeepSeek is seen by some as a signal that exactly such a change is going to happen. Also I have heard rumors (which I don't believe) that one reason for OpenAI to go non-open was because they realized that, too. And with cheap to run open models they would lose the competitive benefit of competition not being able to do from scratch training even if they want to.